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Abstract 

Recently, Bollobas, Janson and Riordan introduced a very general fam- 
ily of random graph models, producing inhomogeneous random graphs 
with O(n) edges. Roughly speaking, there is one model for each kernel, 
i.e., each symmetric measurable function from [0, l] 2 to the non- negative 
reals, although the details are much more complicated, to ensure the exact 
inclusion of many of the recent models for large-scale real-world networks. 

A different connection between kernels and random graphs arises in 
the recent work of Borgs, Chayes, Lovasz, Sos, Szegedy and Vesztergombi. 
They introduced several natural metrics on dense graphs (graphs with 
n vertices and 0(n 2 ) edges), showed that these metrics are equivalent, 
and gave a description of the completion of the space of all graphs with 
respect to any of these metrics in terms of graphons, which are essentially 
bounded kernels. One of the most appealing aspects of this work is the 
message that sequences of inhomogeneous quasi-random graphs are in a 
sense completely general: any sequence of dense graphs contains such a 
subsequence. Alternatively, their results show that certain natural models 
of dense inhomogeneous random graphs (one for each graphon) cover the 
space of dense graphs: there is one model for each point of the completion, 
producing graphs that converge to this point. 

Our aim here is to briefly survey these results, and then to investi- 
gate to what extent they can be generalized to graphs with o(n 2 ) edges. 
Although many of the definitions extend in a simple way, the connec- 
tions between the various metrics, and between the metrics and random 
graph models, turn out to be much more complicated than in the dense 
case. We shall prove many partial results, and state even more conjectures 
and open problems, whose resolution would greatly enhance the currently 
rather unsatisfactory theory of metrics on sparse graphs. This paper deals 
mainly with graphs with o(n 2 ) but ui(n) edges: a companion paper will 
discuss the (more problematic still) case of extremely sparse graphs, with 
0(n) edges. 
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1 Introduction 

In recent years, much work has been done constructing and analyzing mathe- 
matical models of real-world networks. The random graphs in these models are 
inhomogeneous - in fact, many of them have degree sequences with power law 
distributions. In [5], Bollobas. Janson and Riordan defined a very general model 
of an n- vertex random graph G(n, n) with conditional independence between the 
edges which includes as special cases many of the models of real-world networks 
that have been studied, and proved numerous results about the random graphs 
generated by this model, including results about their component structure and 
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the point and nature of the phase transition in them. Here the kernel k is a 
symmetric measurable function from [0, l] 2 to [0, oo) satisfying some mild con- 
ditions. (Some of these conditions arise due to the very general nature of other 
parts of the model, and can be weakened in other contexts; see [S] and [TH] for a 
discussion of this.) Just like the real- world graphs that motivated the construc- 
tion of the BJR model, the random graphs G(n, k) are sparse in the sense that 
the expected number of edges is 0(n) (in fact, (c + o(l))n for some constant c). 
In [5] the kernel k was used to define a multi-type branching process X K whose 
survival probability is closely related to the component structure of G(n, k). 

In order to decide how well our random graph G{n, k) approximates a given 
real- world graph G„, it would be desirable to establish a distance between a 
random graph model and a graph, so that the approximation is judged to be 
better and better as the distance tends to 0. Putting it slightly differently, we 
should like to define a metric on the set of sparse finite graphs so that a Cauchy 
sequence consists of graphs that are in some sense 'similar', and the limit of 
such a (not eventually constant) sequence is naturally identified with a suitable 
random graph model. For dense graphs, graphs with n vertices and at least 
cn 2 edges, such a program has been carried out very successfully in a series of 
papers by (various subsets of) Borgs, Chayes, Lovasz, Sos, Szegedy and Veszter- 
gombi (see [T3J [TH [33J GUI [HI HH| and the references therein). In particular, they 
introduced several metrics on the space of dense finite (weighted) graphs and 
showed them to be equivalent. The limiting objects, i.e., the additional points 
in the completion, turn out to be graphons, that is, bounded symmetric mea- 
surable functions from [0, l] 2 to R. The corresponding random graph models, 
called W -random graphs in [34), are the natural dense version of G(n, k); see 
Subsection 12.31 

The only difference between kernels and graphons is that the latter are 
bounded, while the former must be allowed to be unbounded in order to model, 
for example, highly inhomogeneous real-world networks. In many fundamental 
questions (for example those concerning the phase transition), this difference is 
substantial. The appearance of graphons or kernels in the two different con- 
texts described above suggests the existence of interesting connections between 
these areas. One such connection is described by Bollobas, Borgs, Chayes and 
Riordan [7j, who study (sparse) random subgraphs of arbitrary dense graphs; 
this has recently been extended by Bollobas, Janson and Riordan [10] . 

We have several aims in this paper. First, we shall review some of the results 
of Borgs, Chayes, Lovasz, Sos, Szegedy and Vesztergombi mentioned above. Our 
main aim is then to take the first tentative steps towards a general theory of 
metrics on sparse graphs; in particular, we shall investigate to what extent these 
ideas can be carried over to the sparse setting, and what can be said about the 
connection between the metrics and the ideas of Bollobas, Janson and Riordan. 
As we shall see, the difficulties that arise are considerably greater than in the 
dense case; in fact, the difficulties increase as the graphs get sparser. The almost 
dense case e(G n ) = n 2 ~°^ is already rather different from the dense case; the 
extremely sparse case e(G„) = @(n), which will be studied in a companion 
paper |IIj . is very different indeed, having many novel features. We shall prove 
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numerous results, but the picture we obtain is much less complete than that 
obtained by Borgs et al in the dense case. In fact, perhaps our most important 
aim is to identify some of the main problems and conjectures whose resolution 
would enhance the theory of metrics on sparse graphs. 

An important tool in the study of metrics on spaces of dense graphs is Sze- 
meredi's Regularity Lemma. While there is a version of Szemeredi's Lemma for 
sparse graphs (with o(n 2 ) but u)(n) edges) satisfying a mild additional condi- 
tion, there is no satisfactory counting/embedding lemma for counting (or even 
finding) small subgraphs using regular partitions. This is one of the reasons 
why sparse graphs are much more difficult to handle than dense ones. One of 
our main aims is to prove such a counting lemma for certain subgraphs, greatly 
extending a result of Chung and Graham |17j . 

The rest of the paper is organized as follows. The next section is about 
dense graphs and kernels; we start by briefly recalling some of the definitions 
and results of Borgs, Chayes, Lovasz, Sos, Szegedy and Vesztergombi whose gen- 
eralization we shall discuss, focussing in particular on the cut metric. Then, in 
Subsection we show that these results are closely connected to the question 
of when two kernels are 'equivalent'; we shall need this notion of equivalence 
when we come to sparse graphs. 

The rest of the paper concerns sparse graphs, i.e., graphs with n vertices and 
o(n 2 ) edges: in Section [3] we consider subgraph counts in sparse (but mostly 
not too sparse) graphs, stating a conjecture that generalizes the main result of 
Lovasz and Szegedy [34], and proving various partial results, concentrating espe- 
cially on the uniform case, i.e., on sparse quasi-random graphs. In Section[4]wc 
turn to Szemeredi's Lemma for sparse graphs satisfying an appropriate 'bounded 
density' assumption, and the consequences for questions of convergence in the 
cut metric. 

Sections [5] is the longest and most important section of the paper. In it we 
discuss the relationship between the cut metric and the count metric (to be de- 
fined) in the sparse case. As well as proposing various conjectures extending the 
results of Borgs, Chayes, Lovasz, Sos and Vesztergombi, we prove several partial 
results, amounting to 'sparse counting lemmas' with various assumptions; these 
results, Theorem 15.141 and its variants Theorems 15.151 and 15 . 1 71 are the most 
substantial results in the paper. 

In Section [6] we briefly discuss another metric considered by Borgs, Chayes, 
Lovasz, Sos and Vesztergombi, showing that for graphs that are sparse, but 
not too sparse, it is equivalent to the cut metric. In the extremely sparse case, 
considering graphs with bounded average degree, the partition metric turns out 
to be much more useful than the cut metric. This and a discussion of the many 
problems and interesting open questions concerning metrics on extremely sparse 
graphs will be the topic of a companion paper [llj . 

In Section[7]we return briefly to the relationship between metrics and random 
graph models, and close with some final remarks summarizing our main results 
and conjectures. 

Throughout the paper we use standard graph theoretic notation as in [4] . For 
example, \G\ and e(G) denote respectively the number of vertices and number 
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of edges of a graph G. 



2 Dense graphs 

There are many natural definitions of what it means for two graphs to be 'close', 
and corresponding metrics and notions of Cauchy/fundamental sequences. These 
tend to be particularly natural for 'dense' graphs, with 0(n 2 ) edges. Several 
of these metrics have been studied by Borgs, Chayes, Lovasz, Sos and Veszter- 
gombi [mE], who showed that they are equivalent, and that there is a natural 
completion of the space of graphs under any of these metrics. In this section we 
briefly recall some of these definitions and results; we arc not aiming to give a 
comprehensive survey of the results of these papers, discussing only those that 
will be relevant for us here. Although most of the results mentioned in Sub- 
sections I2.1H2.3I will be from Lovasz and Szegedy [35] and [13] dH] , we shall not 
always adopt their notation or terminology, or indeed follow their definitions 
exactly. 

Borgs, Chayes, Lovasz, Sos and Vesztergombi [15l [16] consider weighted 
graphs, with weights on the edges and on the vertices. For the results we shall 
describe, this makes essentially no difference. In what follows, we consider only 
unweighted graphs; while much of what we shall say presumably carries over to 
suitably weighted graphs, the definitions for weighted graphs are not as natural 
in the sparse case, and are likely to introduce more additional complications 
than new insights. 

2.1 The subgraph distance 

The basic starting point is to consider, for each fixed graph F , the number 
of copies of F in a large graph G, i.e., the number Xf(G) of subgraphs of G 
isomorphic to F . Recall that a homomorphism from a graph F to a graph G is 
a function 4> : V(F) — > V(G) such that 4>(x)4>(y) S E(G) whenever xy € E(F). 
Although Xp(G) (for example, the number of triangles in G) is the most natural 
basic notion in this context, it turns out to be cleaner to work with emb(F, G), 
the number of injcctivc homomorphisms or embeddings of F into G. Note that 

emb(F,G) = aut(F)A F (G), 

so Xp(G) and emb(F, G) contain the same information. Working with the latter 
avoids constant factors aut(-F') in many formulae. 

If F has k vertices, then for n > k we have emb(F,K n ) — nr^) = n ( n ~ 
1) ■ • • (n — k + 1), so the natural normalization is to work with 

emb(F,G) X F (G) rn 

s(F,G) = = i e 0,1 , 

n(k) X F (K n ) 

where, as usual, n = \G\ is the number of vertices of G. If |F| > |G| then the 
above ratio is not defined, and we set s(F, G) = 0. 
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Let T denote the set of isomorphism classes of finite graphs; sometimes it will 
be convenient to enumerate T in an arbitrary way, writing T = {F±, F2, ■ . ■}■ 
(More formally, we shall take each Fi to be a representative of an isomorphism 
class.) The graph parameters s(F, ■), F <E define a natural family of equiva- 
lent metrics on J 7 , by mapping J 7 into [0, 1] 00 (or into [0, 1]^). Indeed, for any 
finite graph G, set 

S (G) = ( Si (G))£ 1 G[0,ir ) 

where Si(G) = s(Fi, G). Let d be any metric on X = [0, 1] 00 which gives the 
product topology, for example d(s,t) = J2iLi^ l \ s i ~ ^1- We may define the 
subgraph distance of two graphs G\ , Gi as 

d TO b(Gi J G 2 ) = d(*(Gi),a(G 2 )). 

It is easy to see that this defines a metric on T: indeed, given G G J 7 , among 
graphs F with s(F, G) > 0, there is a unique graph with \F\ + e(F) maximal, 
namely G. Thus the map G > s(G) is injective. Furthermore, considering 
s(E n+ i, G), where E n+ \ is the empty graph with n+1 vertices, we see that the 
distance between any graph G with n vertices and the set of graphs with more 
than n vertices is positive. It follows that the metric space (!F, d su b) is discrete. 

A sequence (G n ) of graphs is Cauchy with respect to <i su b if and only if, 
for each Fef, the sequence s(F, G) converges. Such sequences are sometimes 
called 'convergent', although they do not converge in the metric space (J 7 , c? S ub)- 
Note that if (G n ) is Cauchy then, since (J-,d su h) is discrete, either (G„) is 
eventually constant, or \G n \ — > 00. 

Many minor variations on the definition of d su b are possible. For example, 
instead of considering the number of embeddings of F into G, one can consider 
the number hom(F, G) of homomorphisms from F to G. If |F| = k and |G| = 
n, then the number of non-injective homomorphisms from F to G is at most 
{^n^ 1 = 0{n k ~ 1 ) 1 so setting 

t(F, G) = hom(F,G)/n fe 

we have 

t(F,G) =s(F,G) + 0(n- 1 ) (1) 

for each F. Hence, in this dense case, the parameters s(F,-) and t(F,-) are 
essentially equivalent. [There is a minor difference that, working with homo- 
morphisms, one ends up with a pseudo-metric: if G is any graph and G^ is the 
blow-up of G obtained by making r copies of each vertex, joined to all copies of 
its neighbours, then t{F,G {r) ) = t(F, G) for all F G T and r > 1.] Also, one 
can pass easily back and forth between subgraph counts and counts of induced 
subgraphs using inclusion-exclusion. 

One of the key properties of the metric d su b is that there is a natural de- 
scription of the (clearly compact) completion of (J 7 , c? su b), in terms of standard 
kernels (also called graphons). Here a kernel is a symmetric measurable function 
from [0, l] 2 to [0,oo); a standard kernel is one taking values in [0, 1]. In other 
contexts, one considers more general bounded kernels, taking values in [0, M] or 
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[— M, M], M > 0, or general signed kernels taking values in K.. One can extend 
the definition of s(F, G) (or of t(F, G)) to kernels in a natural way: given a finite 
graph F with vertex set {1,2, ... ,k}, let 



(Some authors use the notation t(F, n) for the same quantity.) This formula has 
a natural interpretation as the normalized 'number' of embeddings of F into a 
weighted graph with the uncountable vertex set [0, 1], with edge weights given 
by «. Of course, in this context there is no difference between embeddings and 
homomorphisms . 

Lovasz and Szegedy [34] proved (essentially) the following result. 

Theorem 2.1. Let (G n ) be a Cauchy sequence in (JF, rf S ub)- Then either (G n ) is 
eventually constant, or there is a standard kernel k such that s(F, G n ) — ► s(F, k). 



Let us remark that the result proved in |34| concerns t rather than s, which 
makes no difference, except that a separate case for eventually constant se- 
quences is then not needed. Here, the distinction is informative: considering 
the parameters s(Ek,G n ) for each k shows that in the second case above we 
have \G n \ — ► oo. 

Of course, ([2]) allows one to extend the metric d su b to standard kernels, 
obtaining in the first instance a pseudo-metric on the set of standard kernels. 
There is a natural notion of equivalence for kernels, which one can think of as a 
two dimensional version of the equivalence relation on random variables given 
by X ~ Y if X and Y have the same distribution; the details are somewhat 
technical, and not essential for understanding the metrics discussed here, so 
we postpone them to Subsection 12.41 We write ~ for this relation, and JC for 
the set of equivalence classes of standard kernels under ~. Borgs, Chayes and 
Lovasz |12) have shown that k\ ~ k-i if and only if d su b(fti,«2) = (see also 
Theorem 12. 8|) . so e? su b induces a metric on JC. The metric space (K.,d su b) is 
complete (the result about Cauchy sequences of graphs above applies just as 
well to standard kernels). Hence, the completion of (F, <i S ub) is obtained by 
adding to F the set JC of all equivalence classes of standard kernels, and using 
the map s : F U JC — > [0, 1]°° to extend d su b to F U JC. 

There is a natural way to associate a standard kernel kq to a graph G with 
n vertices: divide [0, 1] into n intervals I\, . . . ,I n of equal length (we may and 
shall ignore the question of which endpoints are included), and set kq to be 1 
on F x Ij if ij £ E(G), and otherwise. One slight advantage of using t rather 
than s is that 



for all graphs F and G. However, the metric obtained using t is only a pseudo- 
metric, since graphs on different numbers of vertices may correspond to the 
same kernel, for example if one is a blow-up of the other. 





□ 



t{F,G)=s{F,K G ) 
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We say that a kernel k is of finite type if there is a partition of [0, 1] into 
measurable sets Ai , . . . , Ak so that k is constant on each of the rectangles Ai x 
Aj. Note that kg is always of finite type. 

2.2 The cut distance 

Borgs, Chayes, Lovasz, Sos and Vesztcrgombi [TO] considered another natural 
metric on graphs or kernels, namely, the cut metric, based on a norm used by 
Frieze and Kannan [23]. For any integrable function k : [0, l] 2 — > E, its cut 
norm ||«;||cut is defined by 



| cu t = sup 

S,TC[0,1] 



k(x, y) dx dy 

SxT 



(3) 



where the supremum is over all pairs of measurable subsets of [0,1]. It is easily 
seen that this defines a norm on L°°([0, l] 2 ). In fact, there are several variations 
of this definition: one can take 



N|cut= SUP 
SC[0,1] 



k(x, y) dx dy 

5x5 c 



(4) 



where S c = [0, 1] \ S, or one can take the supremum in ([3J only over sets S, T 
with SOT = 0. It is easy to check that these variations only affect the norm up 
to an (irrelevant) constant factor (see [15]), so we shall feel free to use whichever 
definition is most convenient in any given context. 

There is yet another definition of ||«||cut that is more natural from the point 
of view of functional analysis, namely 



IM|cut= sup / n{x,y)f{x)g{y)dxdy, 
ll/IU,NU<iJ[o,i] 2 

where the supremum is taken over all pairs of measurable functions from [0, 1] 
to [—1, +1]. Since the integral above is linear with respect to each of / and g, 
the supremum is attained at some functions taking values in {— 1, +1}, and it 
follows immediately that this version of the cut norm is again within a constant 
factor of that defined by ([3]). As noted in [TO], for example, this last definition is 
the most natural from the point of view of functional analysis: it is the dual of 
the projective tensor product norm in L^CgiL 00 , and is thus the injective tensor 
product norm in I/ 1 (g)L 1 . Equivalcntly, this is just the norm of the integral 
operator with kernel k, treated as a map from L°° to L 1 . 

Before turning to the cut metric we need one further definition. Given a 
kernel K and a measure-preserving map r : [0, 1] -> [0, 1], let be the kernel 
defined by 

^ T \x,y) = K (r(x),T(y)). (5) 

If t is a bijection, then we call t a rearrangement of [0, 1], and a rearrange- 
ment of k. (It is perhaps more natural to consider measure-preserving bijections 
between two subsets of [0,1] with measure 1; this makes no difference.) Two 
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kernels K\ and k 2 are naively equivalent if one is a rearrangement of the other, 
more precisely, if there is a rearrangement r of [0, 1] such that 

Ki{x,y) = K 2 T \x,y) for a.e. (x,y) G [0, l] 2 . (6) 

In this case we write K\ ~ K2, noting that sa is an equivalence relation. 

The cut metric d cu t on the set of standard kernels may be defined as follows: 

d C ut(Ki,K 2 )= inf ||«i -KjUcut- (7) 

Clearly, this defines a pseudo-metric on standard kernels; in particular, if k,\ m 
K2, then d cu t(Ki, K2) = 0. The reverse implication does not hold; in fact, 
dcut(Ki,K2) = if and only if k± ~ k 2 , where ~ is the equivalence relation 
to be defined in Subsection 12.41 Hence, d cu t induces a metric on the set JC of 
equivalence classes of standard kernels under the relation ~. 

As noted above, there is a standard kernel kq naturally associated to each 
graph G, although the map G 1— ► kq from T to K, is not injective. One extends 
the cut metric to a pseudo-metric on graphs by setting 

d cut (G?i,G 2 ) = d cut (K Gl ,K G2 ), (8) 

and to J- U K, similarly. 

For graphs Gi, G2 on n vertices, there is a much more natural variant of 
their cut distance: let d C ut(Gi,G 2 ) be the smallest e for which we can identify 
the vertices of G\ with those of G2 such that for any bipartition of the vertex 
set, the corresponding cuts in G\ and G2 have sizes within en 2 . In terms of 
kernels, 

d cut (Gi,G 2 ) = min | \k Gi ~ k\ | cut , (9) 

where k\ f=a„ K2 if j6]) holds for some map r that simply permutes the intervals 
/„ corresponding to the vertices, and we take (0| as the definition of the cut 
norm. Note that the supremum implied by (|4]) in the definition (|9|) is over all 
bipartitions of [0, 1], not just those corresponding to bipartitions of the vertices; 
it is very easy to see that this makes no difference: the supremum is attained 
at a vertex bipartition. 

Comparing ((SJ) and @ , since the infimum in the former is taken over a larger 
set, one trivially has d cut (Gi,G 2 ) < rf C ut(Gi,G 2 ). Borgs, Chayes, Lovasz, Sos 
and Vesztergombi [T5] noted that strict inequality is possible. For example, 
taking ^ as the definition of the cut norm, let Gi be a triangle, and let G 2 
be the graph with 3 vertices and one edge. For any pairing of the vertices of 
Gi with those of G 2 , the 'worst' cut is the one in which the isolated vertex 
of G 2 is placed into one part and the other two vertices into the other part. 

This cut has 2 edges in Gi but no edges in G 2 , so d C ut(Gi, G 2 ) = 2/9. On the 

(2) 

other hand, consider the blow-ups G\ , a complete tripartite graph with two 

(2) 

vertices in each class, and G\ , a G4 with two isolated vertices added. Pairing 

(2) (2) 

the vertices of G\ and G 2 by placing two opposite vertices of the G4 in one 
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class of G{ , and the other vertices in different classes, we realize G 2 as a 

(2) (2) (2) 

subgraph of G\ in such a way that the 8 edges of G\ not present in G 2 form 
a non-bipartite graph, so every cut cuts at most 7 of these extra edges. It follows 
that d cu t(G 1 2 ' ) , G 2 2 ^) < 7/6 2 . In fact, one can check that with the vertices paired 
in this way the maximum difference between the sizes of corresponding cuts in 
G[ 2) and G { 2 2) is 6, so 

dcut (Gi,G 2 ) < d cut (G {2) ,G { 2 2) ) < 6/6 2 = 1/6 < 2/9 = d cnt (G u G 2 ), 

showing that d cu t and d cu t do not always agree. For questions of convergence, 
however, the two metrics are equivalent: as shown in [15] . 

4ut(Gi,G 2 ) < Scut(Gi,G 2 ) < 32d cut (Gi,G 2 ) 1/67 . 

At first sight it is not clear why the cut metric should be interesting: after all, 
what is the significance of two graphs having almost the same number of edges 
in all corresponding cuts? One very important consequence of this property is 
that their subgraph counts are close, as shown by the following simple lemma 
from Borgs, Chayes, Lovasz, Sos and Vesztcrgombi [T5] . 

Lemma 2.2. Let k and k! be two standard kernels. Then for every graph F we 
have 

\s(F, K )-s(F, K ')\<e(F)\\K-K'\\ cut . 

Proof. Before we embark on the proof, we extend the definition of s(F, k) 
slightly. Fix the graph F, taking its vertex set to be [k] = {1, 2, . . . , k}, as usual, 
and list the edges of F as {iiji, ■ ■ ■ , i m jm}- Given a sequence (fci, . . . , K m ) of 
standard kernels, set 

„ m k 

s(F; m, . . . , K m ) = / T K r (x ir ,x jr ) TT dx t . 

J [OA] k rJi tl 

Thus s(F, k) = s(F\ k,...,k). We claim that for any graph F with m edges and 
any standard kernels K\, K2, ■ ■ ■ , K m and k'i, we have 

\s(F; Ki, «a, • ■ ■ i«m) - s(F; k[,k 2 , n m )\ < \\m - «i|| C ut- (10) 

Applying this m = e{F) times, changing one kernel from n to k' each time, the 
lemma follows. 

It remains to prove (|10p . which is easy. Suppose without loss of generality 
that the first edge is 12, so ii = 1 and j\ = 2. Our task is to bound 

r, m k 

A = / (ki(x 1 ,X 2 ) - K , 1 (xx,X2)) T\^r(Xi r ,Xj r )T\ dx t 

J [0,n" r=2 i=l 

Collecting the terms in the product that involve x\ or x 2 , we may write this 
product as /o(x)/i(a;i,x)/2(a:2,x), where x = (X3, . . . , £&) and each /, (be- 
ing a product of standard kernels evaluated at certain places) takes values in 
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0. 11. Now from ([3]), it is immediate that if / and g take values in [0, 1], then 
/ n(x,y)f(x)g(y)dxdy\ < | |k| | C u.t> Applying this with x fixed, and then inte- 
grating over x, it follows that |A| < ||/ci — «i|| C utj as required. □ 

Corollary 2.3. Let (G n ) be a sequence of graphs with \G n \ — ■> oo, and let k be 
a standard kernel. If d cu t(G n , k) — > then d su b{G n , k) — > 0. 

Proof. Let n n = kq„, so by definition d cu t(G n , k) = d cu t(/c n j k). By Lemma l2.21 
for every F we have s(F, n n ) — > s(F,n). But s(F,K n ) — t(F,G n ), while from 
Q we have s(F,G n ) = t(F,G n ) + o(l). Thus s{F,G n ) -> s(F,k) for each F, 

1. e., d su b(G„,K) -> 0. □ 

We have just seen that convergence in c? cut implies convergence in d su b; one 
of the main results of Borgs, Chayes, Lovasz, Sos and Vesztergombi, namely 
Theorem 2.6 in [15], gives a converse of this. This result states that the metrics 
d su b (defined using t rather than s) and d cu t are equivalent, in the sense that 
(G n ) is a Cauchy sequence for d su b if and only if it is a Cauchy sequence for d cu t- 
In the light of the various other results of Lovasz and Szegedy [33] and Borgs, 
Chayes, Lovasz, Sos and Vesztergombi [15], this statement may be reformulated 
in our notation as follows. 

Theorem 2.4. Let (G n ) be a sequence of graphs or standard kernels with 
\G n \ — > oo, where we take \G n \ = oo if G n is a kernel, and let k be a stan- 
dard kernel. Then rf su b(G„, k) — > if and only if d cut (G n , k) — > 0. □ 

An immediate consequence of this result is the following, Corollary 3.10 
in [T5]. 

Corollary 2.5. Let k and k! be two bounded kernels. Then s(F, n) = s(F, k') 
for every F if and only if d cu t{^j K ') — 0. □ 

We shall return to a discussion of kernels at cut distance shortly. 
2.3 Kernels and (quasi-) random graphs 

As well as going from graphs to kernels, one can go from kernels to random 
graphs in a very natural way, as in Section 2.6 of Lovasz and Szegedy [33], 
or as in Bollobas, Janson and Riordan [5J for the sparse case. Indeed, given 
a standard kernel k and an n > 1, let G(n,n) be the random graph on [n] 
defined as follows: first let x\, . . . ,x n be iid with the uniform distribution on 
[0, 1] . Given the Xi, join each pair of vertices independently, joining i and j with 
probability k(x{, Xj). The resulting graph is called a n-random graph by Lovasz 
and Szegedy [33], although they use W as their default symbol for a kernel. It 
is easy to check, for example by the second moment method, that, for each F, 
the random variable s(F, G(n, k)) converges (in probability and in fact almost 
surely) to s(F, k) as n — > oo. Thus the sequence G(n, k) converges almost surely 
to k in the metric <i su b or d cu t- Note that if k is constant and takes the value 
p, then we recover the usual Erdos-Renyi model G(n,p): no confusion should 
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arise between the notation for the two models. (In fact, it was Gilbert [35] who 
introduced G(n,p), while Erdos and Renyi [21] introduced a model, G(n,m), 
that is essentially equivalent for many purposes. Since it was they who founded 
the theory of random graphs, both models are often referred to as Erdos-Rcnyi 
models.) 

It is natural to view a sequence (G„) converging to k in d su b as a sequence 
of 'inhomogeneous quasi-random graphs': when k is constant, the convergence 
condition is equivalent to the standard notion of quasi-randomness, introduced 
by Thomason [37] in 1987 (although he called it pseudo-randomness) and stud- 
ied in great detail by Chung, Graham and Wilson [TB] and many others. The 
convergence of G(n, n) to K in d su b establishes that sequences generated by 
the natural inhomogeneous random model are also quasi-random, as one would 
hope. One of the most pleasing features of this whole subject area is the in- 
terpretation that inhomogeneous quasi-random graphs are completely general: 
any sequence of (dense) graphs has such a subsequence. 

To take an alternative viewpoint, we may think of standard kernels as un- 
countable infinite graphs, and a 'typical' random graph G(n, k) as a good finite 
approximation to k. Then the completion of T is obtained by adding these 
infinite graphs, and the approximations G(n, k) (n large) are examples of finite 
graphs close to a given infinite graph. Taking this viewpoint it is natural not 
to identify a finite graph with a kernel. For another, slightly different, point of 
view, see Diaconis and Janson [19| , where connections to certain infinite random 
graphs are described. 

2.4 Equivalent kernels 

In the light of Corollary 12.51 it is clearly important to understand which pairs 
of kernels have d cu t(Ki, K2) — 0; this is also important for understanding d cu t 
itself. Fortunately, it turns that there is a natural notion of equivalence for 
kernels which gives the answer. Since this topic is only touched on in passing in 
Borgs, Chayes, Lovasz, Sos and Vesztergombi [15] . we shall go into some detail 
here. 

Roughly speaking, we would like to say that two kernels are equivalent if one 
is obtained from the other simply by relabelling the 'types' in [0, 1]. It would 
seem that the notion w of naive equivalence defined in ^ is thus the right 
one, but a little thought shows that this is not the case; for this, the random 
viewpoint is very helpful. 

So far, as in [15] , we defined kernels only on [0, l] 2 . In view of the connection 
to random graphs discussed in the previous subsection, it is a priori more nat- 
ural to work with a general probability space (fi,.F, /x) rather than [0,1] with 
Lebesgue measure, defining a standard kernel as a symmetric measurable func- 
tion from the square of a probability space to [0, 1]. (This is the approach taken 
in the sparse case by Bollobas, Janson and Riordan [8].) However, almost all the 
time, we shall consider only kernels on [0, 1]; there are two reasons for doing so: 
firstly, graphs with n vertices correspond to kernels on the discrete space with n 
equiprobable elements, and [0, 1] is the natural limit of these spaces. Secondly, 
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all probability spaces that one would ever wish to work with (all so-called 'stan- 
dard' probability spaces) are isomorphic to Lebesgue measure on an interval, 
combined with (possibly) a finite or countable number of atoms. When studying 
kernels, the presence of atoms makes no difference: for example, a kernel on a 
finite measure space corresponds in a natural way to a piecewise constant kernel 
on [0, 1]. Hence it makes very good sense to consider only kernels on [0, 1]. For a 
formal reduction to the case of kernels on [0, 1] in the context of random graphs, 
see Janson [27] . 

We may think of kernels as two-dimensional versions of random variables (not 
to be confused with vector valued random variables). Two random variables are 
equivalent if they have the same distribution. Equivalently, they are equivalent 
if they may be coupled so as to agree with probability 1. This is the definition 
we shall use for kernels. 

Working, for the moment, on general (standard) probability spaces, and sup- 
pressing the (7-field of measurable sets in the notation, let (fii,/xi) and (^2,^2) 
be two probability spaces. A coupling of (fii, /ii) and (CI2, M2) is simply a proba- 
bility space (f2, /x) together with measure-preserving maps <7j : — > fij, i = 1, 2. 
Thus, if X is a uniformly random point of (f2,/x), then ai{X) and 172(A) are 
uniform on (f2i,/iti) and (fi2>Ma)) respectively. Let be a kernel on (f^,/^), 
i = 1,2. Then K\ and k 2 are equivalent if there is a coupling (Q, of the 
underlying probability spaces such that 

Ki(ai(x),ai(y)) = K2 (0-2(2;), 02(2/)) for (/j, x /j,)-almost every (x,y) £ f2 2 . 

In other words, extending the notation in ([5]) to arbitrary spaces, we require 
nf 1 ^ = K^" 1 a.e.; we write ~ for the corresponding relation. Although this 
definition may seem a little complicated, as explained above it is in fact very 
natural. 

(t) 

Note that K\ « k 2 implies Ki ~ k 2 : if k± = Kg 1 then one couples x € 
[0,1] = fii with t(x) £ f^2- (More formally, we may take = fii, with o\ 
the identity and 02 = t.) It is easy to see that the reverse implication does 
not hold: for example, consider the random variables Ai, A2 on [0, 1] given by 
Ai(x) = x and A2(a;) = 2x— \ 2x\ ; these both have the uniform distribution, but 
since one is 1-to-l and the other 2-to-l, there is no measure-preserving bijection 
from one ground space to the other transforming one into the other. Setting 
Ki(x,y) = Ai(x)Ai(y), one obtains kernels with k± ~ «2 but k± 96 (Recently, 
Borgs, Chayes and Lovasz |12j have shown that if one excludes this phenomenon 
of 'twins', then ~ and « are equivalent; we refer the reader there for a precise 
statement.) 

Returning to the special case of kernels on [0, 1], essentially equivalent to the 
general case, couplings have a very simple description. All that matters is that, 
for a uniform point X of (O, yu), the distribution of (<7i (X), 02(A)) should have 
uniform marginals. Thus, couplings correspond to doubly stochastic measures, 
i.e., Borel measures /i on [0, l] 2 with both marginals Lebesgue measure. In other 
words, we have n\ ~ k 2 if and only if there is a doubly stochastic measure /x 
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such that 



Kx(x, y) = K2(u, v) for (fj, x /x)-a.e. (x, u, y, v) G [0, l] 4 . (11) 

At first sight, [0, l] 2 is the most natural space to use to couple two kernels 
on [0,1], but there is another natural choice. Since [0, l] 2 is isomorphic as 
a probability space to [0,1], we may construct the coupling on [0,1]! Hence, 
Ki ~ k 2 if and only if there are measure preserving maps a±, o~% '■ [0, 1] — ► [0, 1] 
such that /c^ = k^^ f° r (Lebesgue) a.e. (x, y) G [0, l] 2 . Putting this a little 
more symmetrically, we see that K\ ~ k 2 if and only if 

3/c, cti,o"2 such that K = a.e and k = n 2 a2 ^ a.e, (12) 

where k is a kernel on [0, 1] and <r\ and a 2 are measure-preserving maps from 
[0, 1] to itself. Note that k ~ for any kernel k on [0, 1] and any measure- 
preserving map from [0, 1] to itself. 

Since couplings rather than rearrangements give the proper notion of equiv- 
alence for two kernels, it is natural to use couplings rather than rearrangements 
in the definition of the cut metric. Indeed, Borgs, Chayes, Lovasz, Sos and 
Vesztergombi |15j define the cut metric on standard (or simply bounded) ker- 
nels as follows: 



!cut(Ki,K 2 )= inf sup / - k 2 (u,v)) dfj,(x,u) dfj,(y,v) 

neM S ,T JSxT 



(13) 



where M. is the set of doubly stochastic measures on [0, l] 2 , 5 and T run over 
measurable subsets of [0, l] 2 , and the integral is over (x, u) € S and (y, v) G T. 
As shown in [T5], the definitions and t\l'3\i coincide. (This is not hard to 
see - either formula defines a function that is continuous, indeed Lipschitz with 
constant 1, with respect to the cut norm, and hence continuous with respect to 
the L 1 norm. Since the finite-type kernels are dense in L , it suffices to check the 
equality of the two definitions for finite-type kernels, which is straightforward. 
For the details, see [IS].) Since ([7]) is much easier to work with than (fT3"|) . wc 
shall take the former as our definition of d cn t- 

Although is more convenient, there is a sense in which (fTBI is the 'right' 
definition. For example, as we shall now show, the infimum in (|13[) is always 
attained, unlike that in ([7])- This is not discussed in |15j . where it is of no 
particular significance. Here, as in the bulk of the paper, unless otherwise 
specified, all kernels are kernels on [0,1], i.e., symmetric Lcbcsgue-measurable 
functions from [0, l] 2 — > [0, 00). As noted above, it always suffices to consider 
kernels on [0, 1]. Recall that we call a kernel standard if it takes values in [0, 1]. 

Lemma 2.6. Let K\ and k 2 be two standard kernels. Then there is a doubly 
stochastic measure /1 achieving the infimum in (fTc 



Proof. For /j, G M. set 

(14) 



d f _ l {n 1 ,K 2 ) = sup 

S,T 



(Ki(x,y) - k 2 {u,v)) d/j,(x,u) dfi(y,v) 

SxT 
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so our aim is to show that inf^g^ d^Ki, K2) is attained. Before doing so, let us 
note that in the supremum one may restrict the sets S and T in (fl4|) to 'nice' 
sets. Let V denote the set of finite unions of products of (half-open) intervals. 
Since \x is a finite Borcl measure, for any measurable S,T C [0, l] 2 and any 
e > 0, there are sets S',T' € V with fi(SAS'), /i(TAT') < e. Since k : - k 2 is 
bounded by ±1, replacing 5, T by S' and T" changes the value of the integral by 
at most 2e. It follows that the supremum in ([14]) may be taken over S, T £ T> 
without changing its value, as claimed. 

It is well known that M. is (sequentially) compact in the topology in which 
/Lt n — ► /x if and only if /z n (.A) — > /i(A) for every set A £ P. Indeed, writing 2?o f° r 
the set of products of intervals with rational endpoints, since T>q is countable 
any sequence in M. has a subsequence (fx n ) such that (/Lt„(j4)) converges for 
all A £ T> - Using the doubly stochastic property to bound the measure of a 
rectangle with one or more short sides, convergence for all A £ V follows easily, 
and one can check that the limiting values do define a measure /j,. Note that one 
cannot require y n {A) — > fJ,(A) for every measurable A: it is easy to construct 
sequences where y, is concentrated on, for example, the diagonal S = {(x,x)}, 
with fi n {S) = for every n. 

Let (jj, n ) be a sequence of doubly stochastic measures for which d^ n (k±, k 2 ) — > 
d cut (/ti, K2); such a sequence exists by the definition (TT5|) of d cut (Ki, K 2 )- From 
the remark above, (/x„) has a subsequence converging to some £ in the 
appropriate topology. Restricting to this subsequence, we may assume that 
/j, n (A) — ► n(A) for every A £ T>. 

Let 5* = 5i x S2 and T = T\ x T 2 , where Si, S2, Ti and T 2 are all intervals 
in [0,1]. We claim that 



as n — ► 00, for any standard kernel k. Before proving this, let us show that the 
lemma follows. 

For any v £ A4, let 



so c?,,(ki,k 2 ) = sup ST 1/(5, T, v)\. Applying (fT5|) with /c = K\ and k = n, 2 , 
we see that f(S,T,/j, n ) — > f(S,T,fi) holds whenever 5 and T arc products 
of intervals. By additivity, it thus holds whenever 5 and T are in 2?. Since 
dnn( K li K %) = su Pst 1/(5, T, /Zn) I, for S, T £ 2? we thus have 

f(S,T,fi) = liminf f(S,T,/j, n ) < liminf d Mn k 2 ) = dcut(«i, « 2 ). 

As noted earlier, when defining d^Ki, K2) = sup s T 1/(5, T, fj,)\, we may take the 
supremum instead over 5, T £ 2?, so it follows that d^(Ki, K2) < ^cut(^ij ^2)- 
Since d cu t(«;i, K2) = haf/*'eAl dfj,>{n\, ^2), this infimum is attained (at fi), as 
claimed. 




(15) 
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It remains to prove (fT5|) . But this is easy: for any interval / C [0, 1], let 
be the measure on [0, 1] defined by 

V I n(A)=fln(AxI), 

and define [i 1 from /i similarly. Recall that \i n — ► [i on products of intervals. 
Thus Hn{A) — > (Jr(A) whenever A is an interval, and hence whenever A is a 
finite union of intervals. Since /i n ,M £ -Mi we have that fJ, n (A) and are 
both at most the Lebesgue measure of A. It follows that Hn{A) — > yU J (A) for 
any measurable A C [0,1], since for any e we can approximate A by a finite 
union of intervals A' whose symmetric difference from A has Lebesgue measure 
at most e. It also follows that if / and J are two intervals, and A C [0, l] 2 is 
Lebesgue measurable, then 

(/4 x rt)(A) - (/ x n J )(A). 

Indeed, this follows by approximating A by a finite union of products of intervals. 
Considering level sets, we sec that 

J f{x,y) dfJ, r n (x) d^(y) -> J f(x,y) d/j, 1 (x) d\i 3 \y) 

for any bounded measurable function /. Taking / = k(x, y)l x eSi Ij/eTi , I = S% 
and J = T2, this is exactly (TTS"]) . completing the proof. □ 

The special case of Lemma l2.6l where the distance is is of particular interest. 

Corollary 2.7. Let m and k 2 be two standard kernels. Then d cu t(/«i, K2) = 
if and only if «i ~ k 2 - 

Proof. Using (|13|) as the definition of d C ut, if M ~ ^2 then we certainly have 
dcut(«i, «2) = 0; sec ([II])- 

Suppose then than d cu t(Ki,ft 2 ) = 0. From Lemma 12.61 there is a /i £ .M 
such that d M («i, k 2 ) = 0. Let v be the signed measure on [0, l] 4 defined by 

dv{x,u,y,v) = (ni(x,y) - K 2 (u,v)) dy{x,u) dfi(y,v). 

Then d^(«i, K2) = says exactly that xT) = for all measurable S, T C 
[0, l] 2 . Since z/ is a signed Borcl measure, it follows immediately that v is the 
zero measure. Equivalently, Ki(x,y) — K 2 (it, v) = for (/1 x /i)-a.c. points 
(x, u, y, v). Referring to (jllj) again, we see that n\ ~ k 2 . □ 

As we have seen, Corollary 12. 71 is a simple exercise in measure theory. Using 
this corollary, and the equivalence of d cut and d su b proved by Borgs, Chayes, 
Lovasz, Sos and Vesztergombi [15], one obtains the following characterization 
of equivalent (standard) kernels. 

Theorem 2.8. Let k\ and k 2 be two standard kernels. Then s{F, K\) = s(F, k 2 ) 
holds for every finite graph F if and only if K\ ~ k 2 . 
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Proof. Immediate from Corollaries 12.51 and 12.71 



□ 



The analogue of Theorem 12.81 for general (i.e., unbounded) kernels is false, 
even for 'rank 1' kernels with all counts s(F, k) finite. Indeed, if n{x, y) = 
f{x)f{y) for some / : [0,1] — ► [0, oo), then the quantities s(F,k) are easily 
seen to be products of moments of /, viewed as a random variable. As is well 
known, there are non-negative random variables with the same finite moments 
but different distributions; using two such random variables, one can construct 
non-equivalent unbounded kernels Ki,K2 with s(F, K\) = s(F, K%) < oo for all 
F. 

We have shown that it is not hard to deduce Theorem 1 2 . 81 from Theorem 12. 41 
In fact, these results are equivalent! The reverse implication is actually much 
easier. 

Theorem \2.8\ Theorem \2.4\ We write out the argument for a sequence of 

graphs; the treatment for kernels is essentially the same. Let (G n ) be a sequence 
of graphs with \G n \ — ► oo, and let K be a standard kernel. From Corollary 12. 3i 
if d C ut(G n , k) —* 0, then d S ub(G n , k) — ► 0; it remains to prove the reverse impli- 
cation. 

As shown by Lovasz and Szegedy [34] (see their Lemmas 5.1 and 5.2), repeat- 
edly applying even the weak Frieze-Kannan [23] form of Szemeredi's Lemma, 
it is easy to prove that any sequence (G n ) with \G n \ — > oo has a subsequence 
converging in d cut to some standard kernel k'. We shall not give the details 
of this argument here as we shall prove a corresponding statement in a more 
general setting in Corollary 14.71 

Suppose then that rf su b(G n , k) — > 0. Then by the observation above there is a 
subsequence (G nk ) that converges in c? cut to some standard kernel k! '. But then, 
by Corollary 12 . 31 we have d S ub(G„ fc , k') — > 0. Since rf su b(G„, k) — > we must 
have d su b(n, k') = 0, i.e., s(F, n) = s(F, n') for all F. Thus, by Theorem l2.8l we 
have K ~ k', so d cu t(K, k') = 0. Thus d cu t(G nk , n) — * 0. 

We have shown that (G n ) has a subsequence converging to k in d cu t- This 
argument applies equally well to any subsequence of (G n ), and it follows imme- 
diately that the whole sequence converges, i.e., d cu t(G n , k) — > 0, as required. □ 

As we have just seen, Theorem 12. 4i one of the main results of Borgs, Chayes, 
Lovasz, Sos and Vesztergombi [15], is equivalent to Theorem 12.81 As far as we 
are aware, this observation is new. Now Theorem 12. 81 is a fundamental analytic 
fact about bounded kernels: it says that a bounded kernel is characterized up 
to equivalence by the quantities s(F,k) 7 which are the natural analogues for a 
kernel of the moments of a random variable. When the first version of this paper 
was written, we thus had the following rather unsatisfactory situation: the only 
known proof of the analytic fact Theorem 12.81 was that given above, relying 
on the hard results of Borgs, Chayes, Lovasz, Sos and Vesztergombi [T5] about 
sequences of graphs. Fortunately, this situation has now been resolved: Borgs, 
Chayes and Lovasz [12] have given a very clever direct proof of Theorem 12.81 
In fact, they proved a little more. 
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Recall from (fT2"j) that K\ ~ K2 means that 

3k, o~\, <J2 such that k = k^ 1 ^ a.e and k = k^ 2 ' 1 a.e, 

where k is a kernel on [0, 1] and o~\ and 02 are measure-preserving maps from 
[0, 1] to itself. Turning this 'upside-down', let us write K\ ~' K2 if 

3k, (71,(72 such that ki = k^ 1 ^ a.e and K2 = k^ 2 ^ a.e. (16) 

In (|16p . we require k to be a kernel on [0, 1]; it makes no difference if we allow k 
to be a kernel on an arbitrary standard probability space. Note that if Ki ~' K2, 
then using the observation that k ~ twice, we have K\ ~ K2. 
Borgs, Chayes and Lovasz proved the following result. 

Theorem 2.9. For two bounded kernels ki, K2, the following are equivalent, 
(a) s(F, Ki) = s(F, K2) for every finite graph F , (b) k\ ~ K2 and (c) Ki ~' K2. 

The important implication is that if s(F,K\) = s(F,K2) for all F, then 
K\ ~' K2. As noted above, this trivially implies n\ ~ K2, which in turn easily 
implies s(f, ki) = s(F, K2). The proof in |12| is direct, but somewhat technical. 

As shown above, Theorem 12.91 which trivially implies Theorem 12.81 implies 
Theorem l2.4l This gives a proof of Theorem 12.41 that is very different from that 
given by Borgs, Chayes, Lovasz, Sos and Vesztergombi [15] . 

Our aim in the rest of this paper is to investigate the extent to which the 
various results and observations above carry over to sparse graphs, graphs with 
n vertices and o(n 2 ) edges. As we shall see, this gives rise to many difficult 
questions, so we shall present many more questions than answers. 



3 Subgraph counts for sparse graphs 

In this section we consider sparse graphs, where the number of edges is o(n 2 ) as 
the number n of vertices goes to infinity. We shall assume throughout that we 
have at least oj(n) edges, i.e., that the average degree tends to infinity; often, 
we shall make much stronger assumptions. Given a function p = p(n), one can 
adapt many of the notions of Section [5] to graphs with Q(pn 2 ) edges. Indeed, 
let 

emb(F,G) X F (G) 



noting that 



Also, let 



s P (F,G)= =aut(F)- 

p e ^'n ( | F |) p e ^>X F (K n ) 

emb(F,G) 



E(emb(F,G(n,p))) 



bom(F,G) 
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If p = 1, then we recover the definitions in Section^ Furthermore, if < p < 1 
is constant, then we can define a map s as before, but now s maps T into 
the compact space II.fg^IOjP 6 ^'']' an d everything proceeds as before. More 
generally, changing p by a constant factor will be irrelevant: just as we can use 
s c for any c to study G(n, 1/2), we may use s p to study G(n,p/2) or G(n, 2p), 
say, for any p = p(n). 

From now on, we suppose that p = p(n) is some given function of n, with 
p(n) -> as n -> oo. We wish to work in a compact space, so we shall assume 
that there are constants cp, F £ T, such that s p (F,G) < cp for all graphs G 
we consider. Enumerating T as {Fi, F 2l . . .}, we may thus define a map 

00 

s p :^^X=l[[0,CF i ], G^(s p {Fi,G))Zi, (17) 

i=l 

and, using any metric rfonX giving the product topology, an associated metric 

dsub(Gi, G 2 ) = d(s p (Gi), s p (G 2 )). (18) 

We suppress the dependence on p in our notation for the metric to avoid clutter. 
As in the dense case, we can extend e? su b to bounded kernels ft, setting 

4ub(G, k) = d(s p (G), s(k)) and dsub(s(rei), s(k 2 )) = d(a(Ki), s(k 2 )) 

for a graph G and bounded kernels k, ki and k 2 . Here, for a kernel k, s(k) is 
the vector with coordinates defined by ([2]). 

Much of the time, we think of a sequence (G„) of finite graphs. Throughout, 
we are only interested in sequences with |G„| — * 00. For notational convenience 
we always assume that \G n \ = n; this make no difference to our conjectures and 
results. As usual, we need not assume that G„ is defined for every n £ N, but 
only for an infinite subset of N. In this setting, the assumption described above 
may be stated as follows. 

Assumption 3.1 (bounded subgraph counts). For each fixed graph F, we have 
sup n s p (F,G n ) < 00. 

In particular, if (G„) satisfies Assumption 13.11 then, taking F = A" 2 , we see 
that e(G„) = 0(pn 2 ), so our graphs are sparse. There is a stronger version of 
Assumption 13.11 that is perhaps even more natural: 

Assumption 3.2 (exponentially bounded subgraph counts). There is a con- 
stant C such that, for each fixed F, we have limsup s p (F, G n ) < as 
n — * 00. 

In this case, changing p by a constant factor, we may take C = 1 if we like. 
This is not always the most natural normalization, however. There is a reason 
for writing limsup in Assumption 13.21 for any graph G n with \G n \ = n and 
n large, there will be some F with s p (F,G n ) very large. Indeed, G„ contains 
at least one embedding of itself, so s p (G n ,G n ) > l/(n!p e ^ G "- ) ), which typically 
grows much faster than any constant to the power e(G n ). 
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Turning to kernels, there is no longer any good reason to restrict our kernels 
to take values in [0, 1]: in the dense case, the maximum possible 'local density' 
of edges is 1. Here, if we normalize so that G n has pn 2 /2 edges, say, local 
densities larger than p are certainly possible. We shall thus consider general 
kernels, i.e., symmetric measurable functions from [0, l] 2 to [0, oo), rather than 
only standard kernels. We define s(F, n) as before, using (J5J); in general, s(F, «) 
may be infinite, but we shall always assume it is finite for the graphs F and 
kernels K we consider. 

Although we allow unbounded kernels in general, it may be that they give rise 
to difficulties (as they do in the general (very) sparse inhomogeneous model of 
Bollobas, Janson and Riordan [5]). Assumption 13.21 corresponds to the limiting 
kernel (if it exists) being bounded, as shown by Lemma 13.51 below. 

Our main conjecture states that, if p is large enough, then, under Assump- 
tion [321 the equivalent of Theorem 1 2 . 1 1 holds . 

Conjecture 3.3. Let p = p(n) = n" ^, and let C > be constant. Suppose 
that {G n ) is a sequence of graphs with \G n \ = n such that, for every F , s p {F, G n ) 
converges to some constant < cf < C e ^ F \ Then there is a bounded kernel k 
such that cf = s(F, n) for every F. 

As noted above, without loss of generality we may take C = 1. As we shall 
observe later, it is very easy to see that if s p (A' 2 ,G„) — > and s p (F,G n ) is 
bounded for every F, then s p (F,G n ) — > for every F. Thus we may assume 
that s p (A" 2 , G n ) is bounded away from zero, and we may normalize in a different 
way by assuming that s p (K2, G n ) = 1, i.e, that e(G„) = p{^)- 

Assumption 13.21 is trivially stronger than Assumption 13. ll Thus, if (G„) 
satisfies Assumption 13.21 then the sequence s p (G n ) defined by (jTTJ) lives in a 
compact product space, and has a convergent subsequence. Hence there are real 
numbers cf > 0, F G !F, and a subsequence (G 7li ) with s p (F,G ni ) — > cf for 
every F, to which Conjecture 13.31 applies. Conjecture 13.31 is thus a statement 
about the possible limit points of the sequences s p {G n ). 

It may well be that the restriction to bounded kernels is not necessary. 

Conjecture 3.4. Let p = p(n) = n" ^, and let (G„) be a sequence of graphs 
with \G n \ = n such that, for every F, we have s p {F,G n ) — > cp for some < 
Cf < oo. Then there is a kernel k with cf = s(F, k) for every F. 

Wc have stated the above conjectures under the assumption that p = n -0 ' 1 '; 
we shall call this the almost dense case. The reason for this assumption is 
discussed further below. Let us note that, in the almost dense case, for each 
fixed F with k vertices, the denominator in the formula emb(F, G n ) / ' ip e ^ F 'nn.\) 
for s p (F, G n ) is asymptotically 

p e(F) n k 

, which isn fe -°W. Since there are at most 
n k ~ 1 non-injective homomorphisms from F to G n , it follows that t p {F,G n ) ~ 
s p (F,G n ) as n — > oo, so it makes no difference whether we consider s p or t p . 
In general, this is not true: for example, considering homomorphisms which 
map all t vertices on one side of K t ,t into a single vertex, we see that in any 
graph G n with pn 2 /2 edges there are at least n(np)* = (n 2t p* )/{np t ) t ~ 1 non- 
injective embeddings of K t ^. If np l is bounded, then this is comparable to (or 
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larger than) the denominator in the definition of t p (K tt t, G n ), and it follows that 
t p (Kt t t, Gn) — s p (Kt t t, G n ) is bounded away from zero. Thus, for t p (K ti t, G n ) ~ 
s P {K t j,G n ) to hold with both quantities bounded, we need np l — > oo. This 
condition holds for every t only in the almost dense case p = n~ olyl \ 



3.1 Bounded and unbounded kernels 

The following simple observation illuminates the relationship between Conjec- 
tures 13.31 and [ 



Lemma 3.5. Let re : [0, l] 2 — > [0, oo) be a kernel, and C > a constant. 
Then we have s(F, re) < C e<yF ^ for every F if and only if re < C holds almost 
everywhere. 

Proof. The result is trivial if C = 0. Otherwise, rescaling, we may assume that 
C = 1. If re < 1 almost everywhere, then s(F, re) < s(F, 1) = 1 for every F. We 
may thus suppose that re > 1 on a set of positive measure. It follows that there 
is some r\ > such that re > (1 + r\) 2 on a set A of positive measure. Applying 
the Lcbcsgue Density Theorem to A, there is some e > and some rectangle 
R = [a, a + e) x [b, b + e] C [0, l] 2 such that fi{A n R) > fi(R)/(l + rf). Thus, 
the average value of re on the set R is at least 1 + n. Let re' be the kernel taking 
the value 1 + rj on R and elsewhere. Standard arguments from convexity show 
that, for each t, 

s(K t ^ K )>s(K t , uK ')=e 2t (l+ V ) t2 . 
Taking t large enough, we find an F = K t t for which s(F, re) > 1. □ 



Lemma 13.51 shows that a kernel re is bounded if and only if the counts s(F, re) 
grow at most exponentially in e(F). It also shows that, in Conjecture 13.31 we 
need only consider kernels re : [0, l] 2 — > [0, C]. 

Let us say that a kernel has finite moments if s(F, re) < oo for all F. There 
are unbounded kernels with finite moments: the simplest way to construct such 
an example is to consider the 'rank V case, where n(x,y) = f(x)f(y) for some 
/ : [0,1] — ► [0, oo). Indeed, let / be any function from [0,1] to [0, oo) with 
E(/ fe ) = f{x) k dx bounded for every k; for example, let f(x) = log(l/a;) for 
x > 0. Set k(x, y) = f(x)f(y). If F is a graph on {1,2,..., k} in which vertex 
i has degree di, then 



J [0,1]" ijeE(F) 4=1 

= / T\f(x l ) d >T\dx l = T[E(f d >)<^. 

J [0,l] k ;=i i= i i= i 

The calculation above shows that a rank one kernel n(x,y) = f(x)f(y) has 
finite moments if and only if < oo for every p > 1, and hence if and only if 
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\\k\\p < oo for every p > 1. It is tempting to think that this holds in general. In 
one direction, for any kernel k and any graph F on {1, 2, . . . , k}, we may write 



s(F, K ) = 




where Kij (x± , ■ ■ ■ , Xd) = K(xi ,Xj). Thus, by Holder's inequality, 

ijeE(F) ijeE(F) ijeE(F) 

Hence, if < oo for every p > 1, then s(F, k) < oo for every F. The reverse 
implication does not hold, however, as shown by the following example. 

Example 3.6. A kernel with finite moments but infinite 2-norm. Let us 

define a sequence of independent random kernels Kq, Ki, K2, ■ ■ ., as follows. For 
r > 0, let V r be the partition of [0, 1] into 2 2 equal intervals, and let V 2 be the 
corresponding partition of [0, l] 2 : divide [0, l] 2 into 4 2 squares in the obvious 
way, and take as one part of V 2 the union of a square and its reflection in the line 
x = y (which may be the same square). Our kernel n r will be constant on each 
element of P 2 , taking the value 2 r with probability 2~ 2r and otherwise, with 
the values on different parts independent. Note that «o is simply the constant 
kernel with value 1. 

Let n{x, y) — K r(x, y). It is easy to see that with probability 1 the sum 

converges almost everywhere (for example, recalling that \i denotes Lebesgue 
measure, use the fact that E/x{«; r > 0} = 2~ 2r to deduce that, with probability 
1, /x{3s > r : n s > 0} tends to as r — » 00). Also, for large r, HkvIII is 
concentrated around its mean of (2 r ) 2 2 _2r = 1. Hence, with probability 1 we 
have ||«r||2 ^ 0-99 f° r infinitely many r. Using (a + b) 2 > a 2 + b 2 for a, b > 0, 
it follows that is infinite with probability 1; in particular, k does not have 
all p-norms finite. 

Turning to the finite moments property, let F be any fixed graph, with t 
vertices. Since k > kq — 1, we have s(F,k) < s(Kt, k), so we may assume 
without loss of generality that F = K t . Since k is random, s(K tl k) is a random 
variable. We may write its expectation as 

E K E X J ) = E x E re JJ/c^ijXj-), 

i<j i<j 

where E K denotes expectation over the random choice of k, and E x over the 
random choice of (xx, . . . , xt), a sequence of t iid uniform elements of [0, 1]. Let 
us fix x for the moment, assuming as we may that x% 7^ Xj for i j. Let £ be the 
largest r such that some pair Xi, Xj lie in the same part of V r , so < £ < 00. 
Let = ^2 r<i n r and r = ^2 r>e K r , so k = a + t. For r > £, the (*) pairs 
(xi, Xj), i < j, all lie in different parts of V 2 , so the values of K r on these pairs 
are independent. Since different independent, it follows that the values 
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of t on the pairs arc also independent. Now ||cr||oo < J2i=o \\ K r\\oa = 2 f+1 — 1. 
Thus, 

E K \[n{x u x 3 ) <E K Y[(2 e+1 +r(x l ,x J )) = Y[{2 e+1 + E K r{x l ,x j )). 

i<j i<j i<j 

For any x and y we have E K K r (x, y) = 2 r 2 _2r = 2~ r . from which it follows that 
E k t(x, y) < 2, and hence, very crudely, that 

E K UK( Xi , Xj ) <n(2 m + 2) <2 2ft2 . 

i<j i<j 

It remains to take the expectation over x. Since ¥(£ — r) < (l)^ -2 , we find 
that 

noting that for any fixed t the 2 -2 term dominates. If follows that with prob- 
ability 1 we have s(F, k) < oo for every F, giving a kernel with finite moments 
but with ||k||2 infinite. A simple modification, taking the probability that K r 
takes the value 2 r on a given square to be 2~( 1+£ ) r rather than 2~ 2r gives, for 
each e > 0, an example with ||k||i +£ infinite. 

3.2 Non-uniform random graphs 

As in the dense case, there is a key connection between convergence of the 
counts s p (F,G n ) and random graphs. Given a kernel k, let G p (n, k) be the 
random graph on [n] obtained as follows: first choose x\, . . . , x n independently 
and uniformly from [0, 1]. Then, conditional on this choice, join each pair 
of vertices independently, with probability min{pft(xi, Xj), 1}. If pn is bounded 
by 1, then G p (n, k) is simply G(n,pn); we write the parameter p as a subscript 
to emphasize that it is part of the overall normalization: we think of a sparse 
graph generated from the kernel k, rather than a 'sparse kernel' pn. If p = 1/n, 
then G p (n, n) is a special case of the general sparse inhomogeneous model of 
Bollobas, Janson and Riordan [8]. 

Remark 3.7. In what follows, we shall consider many statements about the 
convergence of various sequences of random graphs. As usual in the theory of 
random graphs, the precise notion of convergence is not important: one thinks 
of 'a random graph' with certain asymptotic properties, although this makes no 
formal sense. Formally, it is most natural to work throughout with convergence 
in probability, but this would require us to consider 'in probability' versions 
of our various assumptions, for example the (exponentially) bounded counts 
assumptions 13.11 and 13.21 In fact, it is easy to check that in all cases considered 
here, the error probabilities decay fast enough to give almost sure convergence 
for any coupling of the relevant probability spaces. However, we shall not verify 
this explicitly, noting that one can in any case ensure almost sure convergence 
by passing to a suitable subsequence. 
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Lemma 3.8. Let p = pin) = nT ^, and let k be a kernel with s(F,n) < 
oo for every F. Then s p (F, G p (n, n)) s(F,k) for each fixed graph F, so 
d S ub(G p (n, re), re) 0. In fact, the sequence G p (n, re) converges almost surely to 
re in the metric c£ su b- 

Proof. It is very easy to check that, for every F, s p (F, G p (n, re)) is concentrated 
around its mean s(F,k): indeed, the second moment of the number of copies 
of F can be written as a sum of terms (1 + o(l))n^ H ^p e ( H ^ s p (H, re), and the 
dominant term is the unique one with the largest power of n, where H is the 
disjoint union of two copies of F. (The 1 + o(l) correction is only needed if re is 
unbounded, and appears due to the max{l, ■} in the edge probabilities.) This 
proves the first part of the result. Convergence in probability in <i su b follows since 
convergence in probability in a product topology is equivalent to convergence 
in probability of each coordinate. For the final statement, see Remark 13.71 □ 

Lemma 13.81 implies that if re has finite moments, then the sequence G n = 
G p (n, re) has bounded subgraph counts (i.e., satisfies Assumption ^ . 1| with prob- 
ability 1. If re is bounded, then G n has exponentially bounded subgraph counts 
with probability 1. 

Using Lemma l3.8[ it is easy to see that we must allow unbounded kernels in 
Conjecture [331 Indeed, set k(x, y) = log(l/x) log(l/y) for < x, y < 1, say, and 
let p(n) = 1/logn. Then the random graphs G p (n, re) satisfy Assumption 13. 1 1 
with probability 1, and 

s p (F, G p (n, re)) — * s(F, re) < oo 

holds with probability 1 for every F. Since re is unbounded, by Lemma 13.51 
there is no C with s{F, re) < C e(i ") for every F, so there is no bounded re' with 
s p (F, G p (n, re)) — ► s(F, re') for every F. 

Note that if p decreases too fast with n, then s p (F,G p (n, re)) is no longer 
concentrated around its mean: for example, this is the case if Eemb(_F, G p (n, k)) 
does not tend to infinity. This is the reason for the assumption p = n~°^ in the 
various conjectures and results above: otherwise, there will be some F for which 
the expected number of embeddings does not tend to infinity. Note also that, 
for smaller p, when s p (F, •) and t p (F, •) are no longer asymptotically equal, the 
former is the more natural parameter: for a given F, the lower limit on p below 
which the corresponding parameter for G p (n, re) is no longer close to s(F, re) is 
in general much smaller for s p (F, ■) than for t p (F, ■). It may well be, however, 
that the conjectures in this section (or perhaps just their proofs) fail when the 
relevant parameters s p (F, •) and t p (F, •) are no longer asymptotically equal. 

3.3 Subgraph counts in the uniform case 

Using convexity, it is very easy to check that the only possible kernel re with 
s(K2,k) = s(C4,re) = 1 is the uniform kernel, with re = 1 a.e. The following 
conjecture is thus a very special case of Conjecture 13.41 
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Conjecture 3.9. Let p = p(n) = n- ^, and let (G n ) be a sequence of graphs 
with \G„\ = n, e(G n ) = s p {C±,G n ) — > 1, and sup n s p (F,G n ) < oo for 

each F. Then s p (F, G n ) — ► 1 for every F. 

Of course, there is a variant of Conjecture 13.91 where we replace Assump- 
tion [3TT] by Assumption 13.21 i.e., we demand that lim sup„ s p (F : G n ) < C e ^ 
for some C < oo. In this uniform context there is perhaps less reason to expect 
this to make a difference. 

In the dense case, it is one of the basic results about quasi-random graphs 
that s p (K2, G n ) 1 and s p (C4, G n ) — > 1 imply s p (F, G n ) — > 1 for every F, with 
no further assumptions; see Chung, Graham and Wilson [18) . In the sparse case, 
this result extends easily to certain graphs F; here it turns out to be simpler to 
work with t p {F, G n ) rather than s p {F, G n ). 

Lemma 3.10. Let p = p(n) with pn 1 / 2 —> oo, and let (G n ) be a sequence of 
graphs with \G n \ = n such that t p (K2,G„) — > 1 and t p (Ci,G n ) — > 1. Then 
tp(Gk-, G n ) — > 1 for each k > 5. 

Proof. Suppressing the dependence on n, let A denote the adjacency matrix of 
G n , and let Ai > A2 > • • • > A„ be the eigenvalues of A. For k > 3 we have 

n 

hom(C fc , G n ) = £ A VlV2 A V2V3 ■ ■ ■ A VkVl = tr(A fc ) = ^ A*, 

v 1 ,v 2 ,...,v k GV(G n ) i=l 

SO 

n n 

i P (C fc ,G„)-n-V fc E A ^E^' ( 19 ) 

i=l i=l 

where /x, = \/{np) is the ith normalized eigenvalue of G n . In particular, 

$>• -I- (20) 

The maximum eigenvalue of the adjacency matrix of any graph is at least the 
average degree, so 

ft! = (np)- 1 ^ > {np)-\l + o(l))(n 2 p)/n = 1 + o(l). 

From (|20|) it follows that fi\ ~ 1 and that X)i>2 Mi ~~ * 0- Hence fi% < 1 and 
/i„ > — 1 if n is large enough, and then 

$>?=^ + £ /i? < ^ + max^r 4 , /i^ 4 } X Mi < Mr + £ ^ = 1 + 

i>2 i>2 i>2 

Using (|19|) again, the result follows. □ 

Informally, when pn 1 / 2 — > 00, the parameters s p {Ck, G n ) and t p (Ck, G n ) are 
equivalent. More precisely, Lemma [3.101 implies the analogous statement with 
all occurrences of t p replaced by s p , but this requires a little work to show. 
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The restriction on p in Lemma 13.101 was not used in the proof. However, if 
G n has average degree d, then it contains at least nfy pairs of adjacent edges. 
Thus, writing N^j for the number of common neighbours of i and j, the sum of 

Nij over ordered pairs i ^ j is at least 271(2) = n d(d — 1). Hence, the number of 
homomorphisms from G 4 to G n with a given pair of opposite vertices mapped 
to distinct vertices is 



1 /y-^ \2 ncP(d-i) 



> 



V ^ ~ n(n - 1) 'V " n-1 ' 

The number of homomorphisms with a given pair of opposite vertices mapped 
to the same vertex is simply the sum of the squares of the degrees in G n , which 
is at least nd 2 . Thus, 

hom(G 4 , G n ) > " d " 2(J ~ 1)2 + nd 2 (21) 
n — 1 

for any graph G n with n vertices and average degree d. With d ~ pra — ► 00, this 
gives hom(G 4 , G n ) > (1 + o(l))(n 4 p 4 + n 3 p 2 ), Le., t p (C 4 , G n ) > (1 + o(l))(l + 
n _1 p -2 )- Consequently, i p (G 4 , G„) ~ 1 implies pn 1 / 2 — > 00. When pn 1 / 2 — > 00, 
(j2"Tj) reduces to the well-known fact that, in this case, e(G n ) ~ pQ) implies that 

t p (G 4 ,G„), s p (G 4 ,G„) > l-o(l). 

In the dense case, Lemma f3 . 1 1 extends to triangles. Indeed, tr(^4 2 ) counts 
the number of walks of length 2 in G, which is just 2e(G). Thus 

If p is bounded away from zero then it follows that X)i>2 Mi ^ s bounded as 
n — > 00. Since X)i>2 Mi ~~ * ^ follows by the Cauchy-Schwarz inequality that 
J2i>2 Mi — * 0, an d hence that s p {Cz,G n ) — > 1. 

To obtain a result for triangles in the sparse case by this method, one 
needs stronger assumptions. Defining p by e(G) = n 2 p/2, if we assume that 
t p (G 4 , G n ) = 1 + o(p), then arguing as above we find that X)i>2 Mi = ana - 
Si>2 Mi — so Cauchy-Schwarz does give ^2 i>2 A*f ~^ 0- I n general, many 
results for quasi-random graphs extend to the sparse case with similar modifi- 
cations, where o(l) error terms are replaced by suitable functions of p; see, for 
example, the results of Thomason [37[ 138] on (p, a)-jumbled graphs. Our aim 
here is different; we wish to assume only convergence in the relevant metric, 
making no assumption about the rate of convergence. 

When p — > 0, the conditions of Lemma 13.101 do not guarantee the 'right' 
number of triangles, as our next two examples will show. 

Example 3.11. Very sparse graphs with too few triangles. Throughout 
this example we assume that p\ (n) and p 2 {n) are functions of n satisfying 

P2 = (l-P?)"- 2 (22) 
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and pi,P2 = 0(Vl°g n / V 7 ")- To be concrete, we may take P2 = ^/logn/ 'y/n, 
in which case the corresponding p\ satisfies p\ ~ P2/V2. Suppressing the de- 
pendence on n, let G be the usual Erdos-Renyi random graph G = G(n,pi), 
and let H be the graph on the same vertex set [n] in which vertices i and j are 
joined if and only if they do not have a common neighbour in G. From (f2"2"|) . 
each edge of H is present with probability p 2 ; note that the edges of H are not 
present independently of one another. For any set E of r = O(l) possible edges 
of H, the edges of are all present if and only if no vertex of G is joined to 
both ends of some edge in E. Considering each vertex of G separately, we see 
that the probability of this event is 

(1 - rp\ + 0(^))«+°(D = e -rP>+0(n P >) „ ( (1 _ p l)n-2y = p ^ 

where the 0(1) correction in the first exponent is to account for vertices that 
are endpoints of one or more edges in E. In other words, the probability that 
a bounded number of edges is present in H is asymptotically the corresponding 
probability for G(n,p 2 ). 

For E\,E 2 C E{K n ), the event E2 C E(H) is a down-set in terms of G 
(it says that certain pairs of edges of G are not present), so E2 C E{H) and 
E\ C E(G) are negatively correlated. Hence, if I-E2I = 0(1), we have 

P({£ x C E(G)} n {E 2 C E(H)}) < (1 + 0(1))^^. (23) 

Considering all ways of splitting a set E, it follows that F(E C G U H) < 
(1 + o(l))(pi +p 2 ) |B| , and hence that 

E(s p (F, G U H)) < 1 + o(l) (24) 

for any fixed graph f , where p = pi + p2- 

Since G and H overlap in very few edges, and the numbers of edges of G and 
of H are concentrated, we have s p (K2, GUH) — > 1 almost surely. It follows that 
Sp(C 4 , G U J3") > 1 - o(l) almost surely. Hence, from $21]), s p (C 4 , G U if) 4- 1, 
and it is not hard to deduce that t v (C±, G U H) 1. 

On the other hand, there are by definition no triangles with two edges in G 
and one in H . Hence, from (|2"3"|) . the expectation of emb(/-C3, G U H) is at most 

(l + o(l))n 3 (pl + Q + 3 Pl p 2 2 +p 3 2 ), 

so £^(^3, G U H)) < (p 3 - 3pip 2 )/p 3 + o(l). Since Pi,p 2 and p are all of the 
same order, this final fraction is strictly less than 1, and our construction gives 
almost surely a sequence G„ = GU H with s p (K 2 ,G n ) — > 1, s p (Ci,G n ) — > 1 
but s p {Cz,G n ) 1. Since emb(G3,G) = hom(G3,G) for any G, we have 
t p {Cz,G n ) ~ s p (G3,G„) 7^ 1. Choosing pi and p 2 satisfying (|2^|) so that 
P2 ~ Pi/2, we may achieve s p (03,G„) — > 5/9. Alternatively, choosing pi and 
P2 suitably, we may find a sequence with s p (C3,G„) /■> 1 for any p = p(n) 
satisfying pn 1 / 2 — > 00 and p = 0{^\ogn/ ^Jn). 
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Example 3.12. Very sparse graphs with no triangles. In the context of 
finding explicit constructions giving lower bounds on Ramsey numbers, Alon [1] 
constructed a sequence of graphs G n defined only for certain n, with the follow- 
ing properties, where d = d(n) ~ n 2 / 3 /4: the graph G n is a d-regular Cayley 
graph, it is triangle free and (which is irrelevant here) the largest independent 
set has size 0(n 2 / 3 ). In proving the last property, Alon shows that all eigen- 
values other than Ai = d are uniformly bounded by (^(n 1 / 3 ). Setting p = d/n, 
so t p (K2,G n ) = 1, and writing fa for Xi/(np), as in the proof of Lemma 13.101 
one thus has /xi = 1 and fa = 0(?i -1 / 3 ) for i ^= 2, so from (fT9"]) it follows that 
t p (C 4 , G n ) = 1 + 0{n-^ 3 ) = 1 + o(l). This gives another example of a graph 
with almost the minimal number of C4S but too few (in this case no) triangles. 

Example 3.13. Denser graphs with too few triangles. Let n = mk 

where m — > 00, and let p — y/logm/ \fm. Example 13.111 gives us a graph G' of 
order m with t p (K2,G'),t p (C4,G') ~ 1 and t p (Ks,G') < 0.9, say, for all large 
enough m. Let G be the blow-up of G' obtained by replacing each vertex by fc 
vertices. Since t p (F , ■) is unchanged by blow-ups, we have tp{K 2 , G), tp(C±, G) ~ 
1 but t p (K S} G) < 0.9, from which s p {K 2 , G), s p (C 4 , G) ~ 1 and (for n large) 
s p (if3,G) < 0.91 follow immediately. 

Although p has not changed, the number of vertices has. Seen as a function 
of n, we may choose p = ylogm/ \/m for any m dividing n with m — > 00. 
Exact divisibility is not essential. Either by using this fact, or by restricting 
to a subsequence, we see that any given function p(n) can be realized up to a 
factor of (1 + o(l)), provided p(n) / (\/log njyfn) — > 00 and p(n) = oil). Hence, 
we may construct graphs with the right number of C4S but too few triangles for 
any such function p(n). 

At first sight Example 13.131 seems to contradict Conjecture 13.91 but this is 
not the case. Indeed, for the graph G' that we blow up, (f24|) tells us that we 
do not have too many embeddings of any fixed F. However, while s p ~ t p for 
p = n -0 ' 1 ', the final p we consider, and while blowing up preserves t p , G' is a 
very sparse graph: although it has the same absolute density as the final graph 
G, this density is much smaller than |G"|~°^ 1 ' ) , since G' has many fewer vertices 
than G. It follows that the homomorphism counts in G' are not well behaved. In 
particular, G' contains around m p 3 non-injective homomorphisms from -^2,3, 
which turns out to be much larger than the number m 5 p e of embeddings. It 
follows that G contains too many homomorphisms from, and thus embeddings 
of, K 2> 3, i.e., that s p (K 2 ,3,G) — * °o- 

Remark 3.14. Let us note in passing that the blowing-up argument above 
shows that replacing the assumption p = in Conjecture 13.91 (or Conjec- 

ture 13 .3[) with a stronger assumption such as p(n) > 1/ log log log n, say, makes 
no difference. Indeed, if the conjecture fails, and (G n ) is a counterexample, 
then blowing up G n as above by replacing each vertex by f(n) vertices for 
some rapidly growing f(n) gives a counterexample for a different density func- 
tion, where now the density goes to zero extremely slowly as a function of the 
number of vertices. 
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One possible approach to producing a counterexample to Conjecture 13.91 
would be to consider circulant graphs, i.e., graphs on the vertex set [n] in which 
whether or not ij is an edge depends only on i — j modulo n. There is one 
circulant graph for each subset A of the integers modulo n satisfying ^ A and 
a G A if and only if —a G A. All our conjectures thus imply corresponding 
conjectures for subsets of Z n , the integers modulo n, in which the symmetry 
condition is not likely to be relevant. Most subgraph counts in the graph have a 
rather unnatural interpretation in terms of the corresponding sets; the exception 
is cycles, where the number of fc-cycles in G corresponds to (n times the) number 
of fc-tuples in A k summing to 0. There is a result corresponding to Lemma 13.101 
for subsets of Z„ , proved in the same way but using Fourier coefficients instead 
of eigenvalues. Unfortunately, Examples 13.111 and 13.131 also carry over to the 
set context, in a fairly straightforward way: instead of blowing up the graph, 
we replace each element of A by a block of consecutive integers. This shows 
that any result of the kind we want about subsets of Z„ must involve conditions 
other than constraints on the number of tuples summing to 0. 

In the sparse case, even when p = n~°^ x ' , it is not true that s p {K2, G n ) — > 1 
and s p (C4,G n ) — ► 1 together imply s p (F,G n ) — ► 1 for every F. We have just 
seen one example, with F = C3. There are also much simpler examples. 

Example 3.15. Adding a dense part. Let p = 1/logn, say, and let m = 
m(n) = n/(logn) c where c > is constant. (We ignore rounding to integers.) 
Let G' be any graph on n — m vertices, and let G be the disjoint union of G' and 
a complete graph on m vertices. Since K m contains roughly m) F \ cmbcddings 
of any fixed F, we have 

s p (F,G) ~ s p (F,G') + pe ^ ]Fl = s p (F,G') + (logr^"^. 

Taking G' = G(n — m,p) and c = 3/2, say, we have s p (K2,G) ~ s p (if 2 ,G") ~ 1, 
s p (C 4 , G) - s p (C 4 , G') - 1, but s p {K A , G) - 1 + 1 = 2. Note that s p {K 5 , G) -► 
00, so the assumptions of Conjecture 13.91 are not satisfied. 

The above example is rather artificial: there are too many copies of K A (and 
of K$), but these sit on a small number of vertices. However, the same effect 
can be achieved by taking the union on the same vertex set of G(n,p) and a 
disjoint union oin/m copies of K m . Also, we can use complete bipartite graphs 
instead of complete graphs. 

Example 3.16. A blown- up random graph. Let n = mk, where k = k(n) 
and m = m(n) both tend to infinity. (As usual, we ignore divisibility issues, 
or consider a sequence rij — > 00.) Let G\ be the random graph G(m,p), where 
p = p{n), and let G = G^ be formed by replacing each vertex of G by an 
independent set of size fc, and each edge by a k-by-k complete bipartite graph. 
The number of edges of G is k 2 e{G\) 1 which is asymptotically k 2 m 2 p/2 = 
n 2 p/2, so s p (K2,G) — > 1 in probability and almost surely. Similarly, for any 
fixed graph F, each embedding of F into G\ gives rise to k' F ' embeddings 
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into G; the expected number of embeddings arising in this way is essentially 
the expected number in G(n,p), so whenever this expectation tends to infinity, 
such embeddings will contribute 1 + o(l) to s p (F, G). 

There arc other embeddings of F into G, however, where some distinct ver- 
tices of F are mapped to the same vertex in G\. For G4, we have roughly m 2 pk A 
such embeddings within our complete bipartite graphs, and roughly 2m?p 2 k i 
from embeddings involving three vertices of G%. Provided mp 2 — > 00, we still 
have s p {Ci, G) — ► 1. 

Fix an integer t > 3, and suppose now that m = m(n) and p = p{n) 
are chosen so that m and k = n/m — » 00, and mp* — > c for some constant 
< c < 00; for example, set p = 1/logn, m = c(logn)* and k = c n/(logn)*. 
Note that mp 2 — > 00. Then we have roughly m 2+t k 2+t p 2t embeddings of -K2,t 
into G coming from embeddings into Gi. But we also have roughly m 1+t k 2+t p t 
embeddings into G coming from maps from K 2 ,t into Gi sending the two vertices 
on one side to the same vertex. It is easy to check that these two are the 
dominant terms (mapping the two vertices on one side to the same place we gain 
t factors of 1/p and lose one factor of m; any other identifications gain fewer 
factors of 1/p per factor of m lost), and it follows that s p (K 2 ^, G) — > 1 + 1/c. 

Taking a 'typical' sequence of random graphs constructed as above gives an 
example with s p (K 2 ,G n ) — > 1, s p (C4,G n ) — > 1 (and indeed s p (K 2t r,G n ) — > 1 
for 2 < t' < £), but s p (K2 ! t,Gn) — * 1 + 1/c 7^ 1. Once again, the assumptions 
of Conjecture 13.91 are not satisfied, this time because s p (i^2,t+i, G n ) — > 00. 

We have seen from the examples above that if p(n) — > 0, then s p (K2,G n ) 
and s p (G4,G„) 1 do not themselves imply that s p (F,G n ) — > 1 for every i* 1 . 
However, attempted counterexamples to Conjecture 13.91 seem to be doomed to 
failure by the the additional assumption that s p (F,G n ) is bounded for every 
F. In the next section we shall see that we can make some progress towards 
proving Conjecture 13.91 

3.4 Partial results in the almost dense, uniform case 

In the examples in the previous subsection, each vertex is in about the same 
number of copies of any fixed graph F, but there are relatively few (o(n 2 )) 
pairs that are in too many copies of K 2 ^t, for example. It is easy to see that, 
under the assumptions of Conjecture 13. 9[ this cannot happen. In fact, we can 
make a much more general statement. For this it is convenient to work with 
homomorphism counts and t p {F,G n ) rather than embeddings and s p (F,G n ). 
As noted earlier, in the almost dense case that we consider in this subsection, 
i.e., when p = n~° < - 1 \ the quantities t p (F, G n ) and s p (F, G„) differ by o(l). 

Let F be a fixed graph, and F' a subgraph of F. Without loss of generality, 
suppose that V(F') = [£} C [k] = V(F). Then any homomorphism <pp : F — > G„ 
restricts to a homomorphism <ppi : F' — > G n . With e(G„) ~ pQ)' we expect 
a typical cf>pi to have around n k ~ i p e ^ F ' > ~ e ^ F ) extensions. For each n, let us 
define a random variable Z n (F',F) as follows: let <f>pi be chosen uniformly at 
random from among all homomorphisms from F' into G (if there are any), and 
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let Z n (F',F) be the number of extensions of <j>pi divided by n k ~ i p e{ - F ^ e{ - F >. 
(The reader may well prefer to picture copies of F' and F in G n rather than 
homomorphisms. In fact, it is better to picture embeddings, i.e., labelled copies. 
There are essentially the same number of these as of homomorphisms.) Since 
hom(_F, G n ) is the sum over <f>pi of the number of extensions, we have 

hom(F,G„) = hom(F',G n )E(Z n (F',F))n k ~ £ p< F ^ F '\ 

and hence 

t p (F, G n ) = t p {F', G n )E(Z n (F', F)). 

For r > 2, let rF/F' denote the graph formed by the union of r copies of 
F which all meet in the same subgraph F' , so rF/F' has + r(\F\ — \F'\) 
vertices and e(F') + r(e(F) — e(-F')) edges. A homomorphism from rF/F' to 
G n consists of a homomorphism <fi from F' to G n together with r extensions of 
<p to homomorphisms from F to G, which may or may not be distinct. (They 
almost always will be.) Since we have normalized by the right powers of n and 
p, it follows that 

t p (rF/F',G n ) = t p (F', G n )E(Z n (F', F) r ). (25) 

Let fip = Pf( u ) = n\ F \p e ( F \ which is asymptotically equal to the expected 
number of homomorphisms from F into G{n,p). Then, under the assumptions 
of any of Conjectures 13.31 13.41 and 13.91 it is easy to see that for F' C F, any 
o(/if') copies of F' meet o(/if) copies of F. (Here 'copies' may be subgraphs 
of G ni embeddings, or homomorphisms; it makes no difference.) Otherwise 
t p {2F/F' ,G„) would not remain bounded. This rules out any construction 
of a potential counterexample similar to those above; it also shows that if 
t p (K2,G n ) — > and Assumption 13.11 holds (i.e., (G„) has bounded subgraph 
counts), then t p (F, G n ) — ► for every F. 

Conjecture 13.91 states that infinitely many conclusions (one for each F) hold 
under the same assumptions. We have already proved some of these conclusions, 
with F = Cfc, k > 5. Our next aim is to prove a corresponding result for a much 
wider class of graphs. In doing so, the following observation will be useful. 

Lemma 3.17. LetX n > be a sequence of random variables with sup n E(A^) < 
oo for every k > 1. Then E(AT^) — > 1 for every k if and only if X„ ^> 1. 

Proof. For the forward implication we have E(X n ) — ► 1 and E(X^) — > 1; apply- 
ing Chebyshev's inequality it follows that X n ^> 1. The reverse implication is 
not much harder. Suppose that X n 1, but that E(X^) -/-> 1 for some k. For 
any M, the variables X^lx n <M are uniformly bounded and converge in proba- 
bility to 1, so E(JT„1x„<m) — *■ 1- It follows that there is some M{n) — > oo such 
that E(X*l x 

n <M{n)) ~ * 1- But then E(X^l Xn - > M{n)) ~h 0, so 
E(At +1 ) > E(^ +1 l Xn>M(n) ) > M(n)E(X*l Xn>M(n) ) 
is unbounded, contradicting our assumptions. □ 
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Corollary 3.18. Under the assumptions of Conjecture \3.9l if F' and F are 

fixed graphs with F' C F and t p (F' , G n ) — > 1, then Z n (F' , F) — > 1 if and only 
if t p (rF '/ F' , G n ) — > 1 for every r > 1. 

Proof. Apply Lemma f3. 171 to the random variable Z n (F' , F), using (|23|) to eval- 
uate its moments. □ 

We shall say that the distribution of F is flat over that of F' in G„ , or simply 
that F is flat over F', if Z„(F', F) 1. 

Lemma 3.19. Under the assumptions of Coniecture lS. PI we /lave s p (K s ^, G n ) — * 
1 /or all s,t > 1. Moreover, K\. s is flat over E B , where E s is the empty subgraph 
of K\ iS induced by the vertices in the second part. 

Proof. Let di , . . . , d n denote the degrees of the vertices of G n , and d the average 
degree. Fix s > 1. By convexity, we have 

n 

hom(K hs , G n ) = ^2df> nd s , 

i=l 

which we can rewrite as t p (Ki yS , G n ) > t p {K 2l G n ) s . Since t p (K2,G n ) — > 1 by 
assumption, this gives 

t p {K hs ,G n )> l + o(l). (26) 

Specializing to s = 2 for the moment, let Z n = Z n {E 2 , -^1,2) be the random 
variable describing the distribution of the number of common neighbours of a 
random pair of vertices of G n . For any empty graph we have t p (Ek, G n ) = 1. 
Hence, from ([25]) and (|26|) . 

E(Z n ) =t p {K li2 ,G n ) > l + o(l). 

On the other hand, since tKi i2 /E 2 = K 2 ,t, 

E(Z, 2 J = t p {K 2>2 , G n ) = t p (C 4 , G n ) -> 1. 

Since E(Z%) > E(Z n ) 2 , it follows that E(Z n ) -> 1 and (by Lemma GEHD that 
Z n —> 1. In other words, -Kj.,2 is flat over pairs of vertices. By Corollary 13. 181 it 
then follows that t p (K 2yt , G n ) — > 1 for every t. 

Returning to general s, let W n = Z n (E s , A'i. s ). From (|2"6"|) we have E(W„) = 
t p {K\ tS , G n ) > 1 + o(l). But we have just shown that E(W 2 ) — t p (K 2 ^ s , G n ) — > 
1, so W n — > 1, i.e., K\ iS is flat over Applying Corollary 13. 181 again we thus 
have t p {K Stt , G n ) — * 1 for every t, as required. □ 

Theorem 3.20. Let F be any fixed graph with girth at least A, and let F' 7^ F 
be any induced subgraph of F. Under the assumptions of Conjecture \3.9l F is 
flat over F' . Furthermore, s p (F, G n ) ,t p (F, G n ) — ► 1 as n —> 00. 
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Proof. Note first that the definition of Z n (F', F) makes perfect sense when F' 
is the empty 'graph' with no vertices; there is one homomorphism from F' to 
G n , and Z n (F',F) is constant and takes the value t p (F,G n ). Hence, F is flat 
over the empty subgraph means exactly that t p (F, G n ) — * 1. Since p = n~°^\ 
we have s p (F, G n ) ~ t p (F, G n ), so it suffices to prove the first statement. 

We prove the first statement of the theorem by induction on \F\. If \F\ = 1, 
there is nothing to prove. Suppose then that F and F' are given, with \F\ > 2, 
and that the result holds for all smaller F. 

Suppose first that F' — F — v for some vertex v of F. Let E s denote the 
subgraph of F' induced by the neighbours of v, noting that E s has no edges, as 
F is triangle free. Set X n = Z n (E s ,F') and Y n = Z n (E s ,Ki jS ). Note that these 
random variables are defined on the same probability space: the elements of this 
space are simply s-tuples of vertices of G n . If F' — E s , then F' is trivially flat 
over E s . If not, then F' is flat over E s by the induction hypothesis. Hence, in 
either case, E(X k ) — > 1 for every k. By the last part of Lemma [3.191 Kx <a is 
flat over E s , so E(Y„ fe ) -»■ 1 for every k. It follows that E({X n - l) fe ) -> and 
E((Y n — l) k ) — > for all k > 1. Hence, by the Cauchy-Schwarz inequality, 



E((X n - \)\Y n - I) 1 ) < ^/E((X n -l)2fc)E((y n -l)«) 







for all M > with fc + £ > 0. Writing E^F*) = E((X„ - 1 + l) k (Y n - 1 + 1)*) 
as 1 plus a sum of terms E((X„-l) fe '(Y„- l) e ), k',£' > 0, k' + t' > 0, it follows 
that E(X^) -> 1 for any k,t > 0. 

Any homomorphism 0^?/ from i 71 ' into G n is the extension of a unique ho- 
momorphism 4>e 3 from E s into G„. Furthermore, to extend 4>F' to F we must 
choose for the image of v a common neighbour of the vertices in the image of 
cj)E B - Hence, the value of Z n = Z n (F',F) on cpp> is simply the value of Y n on 
4>E 3 - Choosing tppi uniformly at random, to obtain the correct distribution for 
Z n , the probability of obtaining a particular restriction <j>E a is proportional to 
the number of extensions of 4>e, to F', i.e., to X n . Thus the distribution of Z n 
is that of Y n 'size biased' by X n . In particular, 

Taking k = 1, 2, it follows that i?„ ^> 1, i.e., that F is flat over F' , as required. 

It remains to handle the case \F\ — \F'\ > 2. In this case, we can find 
an induced subgraph F" = F - v of F with F' C F" C F. Note that 
t p (F' ,G n ),t p (F" ,G n ) ~ 1 by induction, that F" is flat over F' by induction, 
and that F is flat over F" by the case treated above. In particular, we certainly 
have 

t P (F,G n ) = t p (F",G n )E(Z n (F",F)) ~ 1. 

Fix e > 0. Let us call a copy of F" (more precisely, a homomorphism from F" 
into G n ) bad if it has fewer than (1 — e)hf/ij-f" extensions to copies of F. Since 
F is flat over F" and t p (F", G n ) ~ 1, there are fewer than e 2 fiF" bad copies of 
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F" if n is large enough. Since each copy of F" extends a unique copy of F' , it 
follows that at most £/if' copies of F' have more than e/j,f>> / /i>f> extensions to 
bad copies of F". 

Let B\ denote the set of copies of F' that have more than e/i p" /f- f> exten- 
sions to bad copies of F", so \Bi\ < e^f' if n is large. Let B2 denote the set 
of copies of F' that have fewer than (1 — e)[j,f" / H-F' extensions to copies of F". 
Since F" is fiat over F', we have \B%\ < £(J.f' if n is large enough, which we 
assume from now on. If is a copy of F' not in B\ U B2, then cf> has at least 
(1 — 2e)fiF" /fJ-F' extensions to good copies of F", which in turn have at least 
(1 — e)^f/^f" extensions to copies of F, so the value of Z n (F',F) on <fi is at 
least (1 — 2e)(l — e). Since there are (1 + o(l))fj,F' copies of F' in total, the pro- 
portion of these copies in B\ U Bi is at most e + o(l). Since e > was arbitrary, 
it follows that the negative part of Z n (F',F) — 1 tends to zero in probability. 
Since E(Z n (F' , F)) = t p (F, G n ) /t p (F' , G n ) -► 1, it follows that Z n {F',F) 1, 
i.e., that F is flat over F' . □ 

The reader may find many of the arguments above familiar from the dense 
case; for example, the proof for K^^ is an absolutely standard convexity argu- 
ment. The key point is that many arguments for the dense case do not carry 
over. In particular, we have shown that almost all, i.e., all but o(n 2 ), pairs 
of vertices have about the right number of common neighbours. In the dense 
case, it follows immediately that almost all (all but o(pn 2 ) = o(n 2 )) edges are 
in the right number of triangles, and hence that t p {K^, 1 G n ) — ► 1. Similarly, the 
proof above shows that any F is flat over all its subgraphs in the dense case, 
without restriction to girth at least 4. In the sparse case, there are only o(n 2 ) 
edges, and there seems to be no simple way to rule out the possibility that a 
large fraction, or even all, of the pairs of vertices corresponding to edges fall 
in the o(n 2 ) set with too few common neighbours. Nevertheless, we conjecture 
that this cannot happen. The simplest graph for which we cannot prove the 
conclusion of Conjecture 13.91 is the triangle. 

Conjecture 3.21. Under the conditions of Coniecture \3. 9\ we have s p {K^, G n ) — 
1. 

In fact, we do not even have a proof that G n must contain at least one 
triangle for n large enough! 

3.5 Extensions to lower densities. 

Let us return to the study of general subgraphs F, rather than simply triangles. 
If true, the various conjectures above may extend to smaller values of p, but 
one must be careful. Firstly, s p and t p no longer coincide, as noted above. 
One should work with s p , because these quantities behave in the right way for 
G p (n, k), while t p does not. A simple modification of the proof of Lemma 13.191 
considering the distribution of the number of common neighbours of a set of s 
distinct vertices, shows that if np s — > 00, then s p (K2, G n ) — ► 1, s p (C4, G n ) — ► 1 
and s p (K St t+i, G n ) bounded together imply s p {K Syt , G n ) — > 1. Taking p = n~ a , 
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with < a < 1/2 constant, there is no corresponding result for t p , even with 
s = 2. Indeed, if t p (K2, G n ) = 1, then there are at least n t+1 p* homomorphisms 
from i^2,t into G n mapping the two vertices in the smaller class to the same 
vertex. It follows that tp(K 2 .t,G n ) will be unbounded for any t > 1/a. 

Secondly, even working with s p rather than t p , we cannot in general hope to 
conclude in the analogue of Coniecturc l3.4l that s p (F, G n ) — > s(F, k) for all fixed 
graphs F. For example, set p = n -1 / 2 and consider the polarity graphs G n of 
Erdos and Renyi (33] , defined (for suitable n) by taking as vertices the points of 
the projective plane over GF{q), q a prime power, and joining x = (xo,Xi,X2) 
and y = (yo,yi,V2) if and only if x y + Xiyi + x 2 y2 = in GF(q). These 
graphs satisfy e(G n ) ~ n 3 / 2 /2 = pn 2 /2 but contain no C4S, and thus satisfy 
s p (K2,G n ) — ► 1 and s p (Ci : G n ) = 0. Since s{C±,k) > s(K2,k) 4 for any k, we 
cannot have s p (F, G n ) — > s(F, «) for F 1 = and for F — C4 in this case. More 
generally, whenever pn 1 / 2 ■/* 00, then there are graphs G n with pn 2 edges but 
too few C4S, so we should only consider the counts s p (C4,G n ) if pn 1 / 2 — > 00. 
This problem is not unique to C4, so it seems that to extend our conjectures for 
p = n~°^ to sparser graphs, we should modify them to refer only to a certain 
set of 'admissible' subgraphs F, depending on the function p = p{n). 

In fact, we should only consider subgraphs F for which the expected number 
fip ~ n' F 'p e ( F ) of embeddings of F into G(n,p) is much larger than the number 
(1 + o(l))n 2 p/2 of edges, at least if pn 1 / 2 — > 00. To see this, first suppose 
that nl F lp e ( F ) ~ An 2 p, for some constant < A < 00. Form a graph G' from 
G = G(n,p) by adding en 2 p/(2e(F)) copies Fi,F 2 , ... of F, chosen uniformly 
at random from all subgraphs of K n isomorphic to F. After deleting the small 
number of duplicate edges, we have added around en 2 p/2 edges, so s p (K2,G') ~ 
1 + s. It is easy to check that the number of C4S in G' containing two or more 
edges from one single Fi is negligible and thus, considering C4S formed from 
all combinations of edges from G(n,p) and from different Fi, that s p (C4, G') ~ 
(1 + e) 4 whp. Hence, the appropriate limiting kernel is the constant kernel 
K = 1 +£. Copies of F itself containing at most one edge from each F^ contribute 
(1 + e) e ^ to s p (F,G'), but there are 0(n' F lp e ( F )) extra copies of F, namely 
the Fi themselves. It follows that s p (F,G n ) -/-> 1. If n^ F ^p e ^ = o(n 2 p), then 
the argument is much simpler: adding a few copies of F to G(n,p) does not 
change the number of edges or C4S significantly, but does change the number of 
copies of F. 

We can go somewhat further: the construction in Example 13.111 shows that 
for C3 to be admissible, the expected number of C3S per edge should be larger 
than logn. A similar construction can be carried out for any fixed F, and shows 
that, at least for suitable balanced F, we should require n^ F ^p e ^ / (np 2 log n) — > 
00 for F to be admissible. In general, for F to be admissible, we need all induced 
subgraphs F' of F to be admissible; otherwise, the distribution of copies of F 
over F' cannot be flat as we expect in the uniform case. 

Returning to triangles, in the light of the comments above, perhaps the 
strongest conceivable extension of Conjecture 13.211 to smaller p would be that 
if p = p(n) = Lu{s/\ogn/ 's/n), and s p (K 2 .,G n ) — * 1, s p (C 4 ,G„) — > 1, and 
sup n s p (K2,t, G n ) < 00 for each t, then s p (C3,G n ) — > 1. However, it may well 
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be that the graphs constructed by Alon [T] mentioned in Example 13.121 have 
s p (K2,t,G n ) — ► 1 for each t. (This may also be true of Kim's random construc- 
tion [3D] giving his famous lower bound on the Ramsey numbers R(3,t).) If so, 
blowing these graphs up as in Examplc l3.13l would show that even in the almost 
dense case, controlling the K 2y t counts is not enough, so one should control (at 
least) the K s-t counts for some larger s. Returning to much sparser graphs, we 
then have to limit ourselves to p = p(n) for which A Sjt is admissible, suggesting 
the following conjecture. 

Conjecture 3.22. There are constants s > 2 and a > such that, if ' p = p(n) = 
oj((logn)°n _1 / s ) andG n is a sequence of graphs with \G n \ = n, Sp(A" 2 ,G„) — > 1, 
s p (C4, G n ) — > 1, and sup^ s p (K s j, G n ) < oo for each t, then s p (C3, G n ) — > 1. 

It may be that if the conjecture holds for a given s, it holds with c = 1/s. It 
may also be that one needs to control the counts for K s>t and at the same time 
to consider p larger than n~ b for some b < 1/s. 

There is a potential pitfall in handling subgraph counts when p is smaller 
than 7i — in proving that s p (F,G n ) — > 1 for various graphs F above, we 
made use of the assumption that s p (F' , G n ) is bounded for other graphs F' . In 
particular, with F — A^t, we used this assumption for F' = A^t+i- It may 
be that F' is admissible whenever F is (as is likely in this case: K^.t should 
be admissible as soon as C4 is), but perhaps not. In the latter case we may be 
forced to work with a larger admissible set for which we impose the hypothesis 
of Conjecture 13.31 for Conjecture 13. 4p . and a smaller set for which we obtain the 
conclusion. In any case, the (smaller) admissible set should have the following 
property: if T a denotes the set of admissible graphs when p = n~ a , a > 0, 
then the sets T a should increase as a decreases, and their union should contain 
all finite graphs. We shall return to this question in Section [3 in particular in 
Subsections 15.31 and 15.41 where we prove results that are steps towards (non- 
uniform) versions of the various conjectures in this section. 

4 Szemeredi's Lemma and the cut metric 

In the next section we shall discuss the relationship between the cut and count 
metrics. As in the dense case, a key tool in the study of the cut metric is 
some variant of Szemeredi's Lemma |36j : this will be discussed in this section. 
Unlike in the dense case, we need an assumption on the graphs we consider to 
make this useful; roughly speaking, our assumption is that no subgraph of G n 
containing a constant fraction of the vertices has density more than a constant 
factor larger than it should have. Several of the usual proofs of Szemeredi's 
Lemma extend easily to the sparse case under this assumption; this was noted 
independently by Kohayakawa and Rodl; see [35]. (The much earlier Theorem 
2 of Kohayakawa [31] is slightly different.) 

Throughout this section, p = p(n) with p = o(l) and np — > 00. (Often, 
n 2 p — > 00 is enough in the proofs, but see Remark 14.41 ) As before, (G n ) always 
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denotes a sequence of graphs with \G n \ = n, which need not be defined for all 
n, but only for some infinite set. 

For disjoint sets A, B of vertices of a graph G = G n with n vertices, we 
write ec{A, B) for the number of edges of G joining A to B, and 

dp(A,B) = £ ff5pT (27) 

for the normalized density of G between A and B. It is convenient to extend 
this definition to sets A and B that need not be disjoint: in this case, we write 
ec(A, B) for the number of ordered pairs (i, j) with i £ A, j 6 B and ij 6 E(G); 
we then define d p (A, B) as above. Note that e G (A,A) = 2e{G[A}). We shall 
make the following assumption: 

Assumption 4.1 (bounded density). There is a constant C and a function 
n (e) such that, for every e > and ?i > n (e), and any A, Be V^(G„) with 
|B| > en, we have d p (A, B) <C + e. 

It suffices to impose this assumption only when A = B, replacing C by C/2 
and e by e/2. Indeed, if \A\, \B\ > en, n > n (e), and d p (A, B) > C + e then, by 
averaging, we may find A' C A and B' C B with |j4'| = \B'\ = \en\ such that 
d p (A',B') > C + e. Then e G (A'UB',A'UB') > 2e G (A',B') > 2(C + e)\A'\ 2 > 
(C/2 + e/2)\A'UB'\ 2 . 

The condition above may be written more compactly as follows: 

Ve>0: limsupmax{d p (A, B) : A, B C V(G n ), \A\, \B\ > en} < C. (28) 



Note that we shall often assume that (|28|1 holds for a particular value of C: 
in this case, we say that (G n ) has density bounded by C. This is the reason for 
including the final +e in Assumption 14.11 

It will be convenient to phrase the proof of Szemeredi's Lemma in terms of 
kernels. In this sparse setting, the way in which we associate a kernel to a graph 
is different from in the dense case. Indeed, our aim is that the random graph 
G(n,p) should approximate the constant kernel taking value 1. For this reason, 
to a graph G with n vertices 1, 2, . . . , n we associate the kernel kq taking the 
value 1/p on each square ((i — i/n] x ((j — l)/n, j/n] whenever ij £ E(G), 
and zero elsewhere. This association will often be implicit: for example, given 
a graph G and a kernel k, we write d cn t{G, k) for d cu t(KG, K )- 

The following observation shows the importance of bounded density. In the 
proof, and throughout this section, given a subset A of the vertices of a graph 
G, we shall often abuse notation by also writing A for the corresponding subset 
of [0,1]. 

Lemma 4.2. Let p = p(n) be any function of n, let n : [0, l] 2 — > [0, C] be a 
kernel, and let (G n ) be a sequence of graphs with \G n \ = n and d cut (G n , k) — > 0. 
Then (G n ) has density bounded by C. 
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Proof. Suppose that (G n ) does not have density bounded by C. Then there is 
an e > such that, for infinitely many n, there are sets A n , B n C V(G n ) with 
I \B n \ > en and d p (A n , B n ) > C + e. Identifying A n and B n with subsets 
of [0, 1], and writing \x for Lebesgue measure, we have 

K Gn = d p (A n ,B n )fi(A n )fi(B n ) > (C + s)n(A n )n(B n ). 

A„xB„ 

Since n is bounded by C, it follows that 

K Gn - « (r) > en(A n )fi(B n ) > e 3 

I 4„xB„ 

for any rearrangement k^ t ' of k, which contradicts d cu t(G n , k) — > 0. □ 
4.1 Weakly regular partitions 

If G is a graph with vertex set {1, 2, . . . , n}, and II = (Pi, . . . , PjS) is a partition 
of V(G), then we write G/U for the kernel on [0, l] 2 taking the value d p (P a , p,) 
on the union of the squares ((i— l)/n,i/n] x (j — l)/n,j'/n], i G P , j G Pj,. We 
say that a partition II of a graph G is weakly (e, p)-regular if | |k<3 — G/II| | cu t < e. 
Note that the normalizing function p comes in via the definition of the kernels 
no and G/U. 

For a kernel k, the definitions are similar: for A, B C [0, 1] we write k(A, B) 
for the integral of k over A x B, and 

d(A,P) = d K (AP)= 



for the average value of k on A x B. Then d p (^4, P), defined using G, is exactly 
P), defined using kg, so the kernel G/U is obtained from kg by replacing 
the value at each point by the average over the relevant rectangle P„ x fl,. For 
K a kernel and II a partition of [0, l] 2 , we define k/T1 similarly. The partition II 
is weakly e -regular with respect to k if \\k — ft/n|| cu t < e. 

The next lemma is a a sparse equivalent of (a version of) the Frieze-Kannan 
'weak' form of Szemeredi's Lemma from |23| . As with many proofs of the various 
forms of Szemeredi's Lemma, the proof of the dense result is not hard to adapt 
to the sparse setting: the only additional complication is that one must make 
sure that the parts of the partition remain large enough so that we can make 
use of the bounded density assumption. In the following lemma, p = p{n) is any 
normalizing function with pn 2 — > oo. In principle, the various constants depend 
on the choice of p, but this is not the case if we impose an explicit lower bound 
on p(n), such as the harmless bound p > rt 3 / 2 . 

Lemma 4.3. Let p = p(n) be any function with < p < 1 and pn 2 — > oo. Let 

e > 0, C > and k > 1 be given. There exist constants no, K and r\ > 0, all 
depending on s, C and k, such that, if G n is any graph with n > uq vertices 
such that 

d p (A, B) <C whenever \A\, |P| > r/n, (29) 
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and II is any partition of V(G) into k parts P\, . . . , Pk with sizes as equal as 
possible, then there is a weakly {e,p) -regular partition II' ofV(G n ) into K parts 
that refines II. 

Proof. Reducing e if necessary, we may assume that s < C, say. We assume 
without comment that n is 'large enough' whenever this is needed. 

Let LTo = II. Wc shall inductively define a sequence II t of partitions of V(G) 
into k t = 2*fc parts, stopping either when we reach some II t that is weakly 
(e/2,p)-regular, or when t > T = \16C 2 /e 2 ~\ + 1. Every part of Il t will have 
size at least 7*n/(2fc), where 7 = e/(100C) < 1/100. Note that Ilo satisfies this 
condition. 

Set r] = 7 T /(2fc), and let no be a large constant to be chosen later. We shall 
write Kt for the kernel G/Ht, noting that, since all parts of lit have size at least 
7771, the kernel Kt is bounded by C. 

Given IL as above, suppose that II t is not weakly (e/2,p)-rcgular. Then 
there is a cut [0, 1] = A U A c exhibiting this, i.e., a set A C [0, 1] for which 
\k g (A,A c ) — K t (A, A c )\ > e/2. Since both k g and K t correspond to weighted 
graphs on V(G) = {1, 2, . . . , n}, we may choose the cut A to correspond to a 
subset of V(G): among all 'worst' cuts, there is a cut of this form. 

Our aim is to modify A slightly to obtain a set B (which we may think of 
as a subset of V(G) or as a subset of [0, 1]) and then take two parts PiCiB and 
Pi n B c of II t+ i for each part of II t ; in doing so, we must ensure that neither of 
these parts is too small. We modify the set A to obtain B in kt stages, one for 
each part Pi. At each stage, we move a set S of at most 7|Pj| > rjn vertices from 
A to A c or vice versa, to ensure that both B and B c meet P, in at least 7|Pi| 
vertices. Since K t is bounded by C, this changes the value of the cut n t {A, A c ) 
by at most 2C n f\Pi\/n. 

From d2HJ), the set S meets at most Cpwy\Pi\ edges of G: to see this, apply 
O to S and V(G) if |5| > rjn, and to S' and V(G) otherwise, for any S' D S 
with \-qn\ vertices. Hence, the value of the cut k g {A 1 A c ) changes by at most 
2C"f\Pi\/n when we move our set S from one side of the cut to the other. After 
all these changes, we have 

\ Kt (A, A c ) - Kt (B, B c )\, \k g (A, A c ) - KG (B, B c )\ < 2C 1 < e/8. 

It follows that 

\K G (B,B c )- Kt (B,B c )\ >e/4. (30) 

Let IT f be the partition obtained by intersecting each part of II t with B 
and B c , noting that ILj+i has all the required properties. Set Kt+i = G/Tlt+i, 
noting that K t +\{B : B Q ) = k g (B, B c ), since IT t+ i refines the partition (B,B C ). 
From ([30]) it thus follows that 

Hfct+i - «t||i > ||«t+i - k*|| cut > e/4, 

with the final inequality witnessed by the cut (B,B C ). Hence, ||Kt+i — ftflU > 
||Kt + i — K t \\i > e 2 /16. Since n t may be obtained from K t +i by averaging over 
rectangles, K t and Kt+i — Kt are orthogonal: for any two parts P^, Pj of n t , the 
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kernel n t is constant on Pj x Pj. Also, J P . xP . «t+i = Jp. xP . K G = S PzXP] K t- 
Thus J p xP Kt(Kt+i — n t ) = 0. Summing over i and j it follows that J K t (nt+i — 
K t ) = 0. Thus, 

\\K t+1 \\ 2 2 = \\K t \\ 2 2 + \\ Kt+ i - K t \\l > \\ Kt \\l + e 2 /16. 

It follows by induction that ||/tt||| > te 2 /16 as long as our construction contin- 
ues. But, as noted above, n t is bounded by C, so our construction must stop 
after at most 16C 2 /e 2 steps. Since this number is smaller than T, we must stop 
at a weakly (e/2,p)-regular partition. 

To complete the proof we modify the final partition II t slightly. Set K = 
fc|~7~ T ], and note that, since t < T — 1, each part of II t has size at least 
r )^ 1 n/K. First, adjust the parts slightly so that the size of each is of the 
form a\n/K\ + b\n/K~\, a,b € Z + , replacing the kernel Kt by a new kernel k' 
corresponding to the altered partition n'. Arguing as above, \\Kt — t'||cut ^ 
2C7 < e/4, so, by the triangle inequality and weak (e/ 2, p) -regularity of II t , we 
have 

Hrt'-KGllcut < ||Kt-«G|Ut + ||Kt-« / ||cut < s/2 + e/4 = 3e/4. 

Finally, we split each part randomly into parts of sizes exactly [n/K\ and 
\n/K~\, obtaining a partition II" into K parts whose sizes are as equal as pos- 
sible. We write k" for the corresponding kernel. Since II' has O(l) parts, and 
we have Q(pn 2 ) edges between any two parts with density at least e/100, say, 
it follows from Chcrnoff 's inequality that if n is large enough, which we enforce 
by choosing n suitably, then with probability at least 0.99 the density d p (A, B) 
between every pair (A, B) of new parts A and B coming from parts Pj and Pj of 
IT with d p {P,, Pj) > e/100 is dp(Pi,Pj)(l + o(l)). Since the densities d p {Pi,Pj) 
are uniformly bounded by C, it follows that with probability at least 0.99 we 
have \ \k" - k'\\i < e/100. But then 

\\n"-K G \\ cut < 1 1 k" — k'| | CTt +| I KG I lent < ||K"-K'||i+||/c'-KG||cut < e/100+e/2, 
so our final partition II" is indeed weakly (e,p)-rcgular. □ 

If for any reason we want a weakly (e,p)-regular partition into a particular 
number K of parts (which must be a multiple of the number in the original 
partition if we are refining a given partition), the proof above gives such a 
partition for any large enough K, indeed, for any K > k\j~ T ]. Of course, no 
then depends on K. 

Remark 4.4. The proof of Lemma 14.31 works even if p is very small, say of 
order 1/n. However, this is of no help it is impossible for Assumption 14. ll 
(the sequence version of (|29|)) to be satisfied in this range, except in the trivial 
case where e(G n ) = o(pn 2 ) (so p is not the appropriate normalizing function). 
Indeed, passing to a subsequence where e(G n ) / (pn 2 ) is bounded away from zero, 
picking any epn 2 edges of G n , and putting one endpoint of each edge into A and 
the other into B, we find sets A, B with \A\, \B\ < epn 2 but e(A, B) > epn 2 , 
which gives d p (A, B) > l/(ep 2 n 2 ), which tends to infinity as e — > 0. 
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4.2 Strongly regular partitions 

Usually, when working with the cut metric, weak e-regularity turns out to be 
just as good as the usual stronger £-regularity. In the dense case, this is true 
also when considering subgraph counts. However, for the subgraph counts we 
consider in the next section, it turns out that we do in fact need the usual form 
of £-regularity. 

As usual, a pair (A, B) of (not necessarily disjoint) subsets of V(G) is an 
(e 1 p)-regular pair if \d p (A', B') - d p (A,B)\ < e whenever A' C A and B' C B 
satisfy \A'\ > e\A\ and \B'\ > e\B\. A partition II = (Pi, ...,P k ) of V(G) is 
(e , p) -regular if the parts Pj each have size [n/fc] or \n/k\, and all but at most 
£(2) of the unordered pairs {Pj,Pj}, i ^ j, arc (e,p)-regular. The definition 
(now simply of e-regularity) for a kernel is similar, although here one partitions 
the interval [0, 1] into parts with measure exactly 1/fc. 

The following is (essentially) the sparse version of Szemeredi's Lemma ob- 
served by Kohayakawa and Rodl; see [52], where a closely related result is 
proved. For a proof, see also Gerke and Steger [23]. We shall include a proof 
here as we state the result in a slightly different way (which makes no real differ- 
ence), and the use of kernels allows one to phrase the proof a little more simply 
than in [32] or [24]. 

Lemma 4.5. Let p = p(n) be any function with < p < 1 and pn 2 — > 00. Let 

e > 0, C > and k > 1 be given. There exist constants n , K and r\ > 0, all 
depending on e, C and k, such that, if G n is any graph with n > no vertices 
such that 

d p {A, B) <C whenever \A\, \B\ > rjn, (31) 

and II is any partition of V(G) into k parts P\, . . . ,Ph with sizes as equal as 
possible, then there is an (e,p)-regular partition II' of V(G n ) into at most K 
parts that refines II. 

Proof. Reducing e and/or increasing C if necessary, we may suppose for conve- 
nience that e < 1 and C > 1. 

Set 7 = e 3 /(100C). This time we inductively define a sequence II t of par- 
titions of V(G) into k t parts, where n = IT, k = k, and k t+ i = k t \k t 2 kt /-f~\ , 
stopping either when we reach some II t that is (e,p)-regular, or when t > T = 
[20C 2 /e 5 ] + 1. The parts of each II t will have sizes as equal as possible. Note 
that IIo satisfies this condition. 

Set r\ = l/(2fcr), and let no be a large constant to be chosen later. We 
assume throughout that n > uq. As before, we write Kt for the kernel G/Ht, 
noting that, since all parts of lit have size at least rjn, the kernel Kt is bounded 
byC. 

The key (standard) observation is the following. Let A and B be parts of 
II t , so K t is by definition constant on Ax B, and let A' C A and B' C B. 
Let II' be any partition refining II such that each of A' and B' is a union of 
parts of II', and let n' — G/IL' be the corresponding kernel. Restricted to 
A x B. the function k' integrates to d p (A, B)fi(A)fi(B) = J AxB K t , since A 
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and B are unions of parts of k'. Hence, n t and k' — n t are orthogonal on 
this set. Using the fact that A and B' are unions of parts of k' , we see that 
Ia'xB' k ' ~ d p (A', B')fi(A')fi(B'), which differs from the integral of n t over the 
same set by \d p (A',B') - d p (A, B)\fi(A')(j,(B'). It follows that ||«' - K t \\l is at 
least (d p {A', B') - d p {A, B)) n{A')n(B'), and hence, using orthogonality, that 

f (k') 2 >( K 2 t + (d p (A',B')-d p (A,B))\(A'MB'). (32) 

JAxB JAxB 

Suppose then that n t is not (e,p)-regular, and let A\,... ,A^ t denote the 
parts of H t . Then there are at least efj) pairs {Ai, Aj} of parts of Tl t that are 
not (e,p)-regular. For each, pick sets Aj C Ai and Aji C Aj witnessing this, 
i.e., with \d p (Aij , Aji) - d p (Ai, Aj)\ > e and |Ay| > e\A t \, \A 5i \ > e\A 5 \. Let n' 
be the partition whose parts are all atoms formed by the sets Ai and the sets 
Aij taken together, so n' refines lit, and each Ay is a union of parts of n'. We 
could estimate the i 2 -norm of G/W using (|3"2")l , but this will not be useful if 
some parts of n' are too small, so we first adjust the part sizes. 

Define n t+ i by dividing each Ai into kt+i/kt parts whose sizes are as equal 
as possible, so that each part of n' differs from a union of parts of n t +i in at 
most n/kt+i vertices: to do this, keep taking for a part of ILj+i a subset of some 
part of n', until what is left of every part of n' is too small. For each i, there 
are at most k t sets A^ inside Aj, so Aj is a union of at most 2 kt parts of n'. It 
follows that there is some union A'^ of parts of n t+ i with 

\A l , J -A' l] \<2 k *n/k t+1 < 1 n/kl 

Arguing as in the proof of Lemma 14.31 it follows from (|3"Tj) that the symmetric 
difference Sij of A^ and A'- meets at most 

Cpn\S l3 \ < C P1 n 2 /k 2 t < e 3 j>|^||Aj|/99 

edges of G, if n is sufficiently large. Since \Sij\ < e 3 |A|/100, say, while \A^ \ > 
e\Ai\ and |Ajj| > e | A j- 1 , it follows crudely that 

\dp( A ij, A ji) ~ dp(Aij,Aji)\ < e/2, 

which implies that 

|d p (^,^,)-d p (A,^)|>£/2. 

Now A'ij and A'^ are unions of parts of n t+ i, and these sets have size at least 
en/(2k t ). Hence, from (|32[) . 

/ 4+i> [ K 2 t +e 4 /(16k 2 ) 

JAiXAj J AiXAj 

for each of the at least £(2) irregular pairs {Aj,Aj}. Since |, , k 2 +1 > 
Ja xA K t always holds, it follows that ||ACt+i||l > H^tlll + £ 5 /20. 
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If the construction above does not stop before step T, then by induction 
we have \\K t \\2 > te 5 /20 for < t < T. But each n t is bounded by C, so 
HktIH — C 2 : giving a contradiction. Hence the construction does stop before 
step T, giving an (e,p)-rcgular partition with k t < parts. □ 

Note that Lemma 14.51 implies (essentially) Lemma 14.31 it is easy to check 
that an (e,p)-regular partition is, say, weakly (10(C + l)e,p)-regular, provided 
the parts are large enough for (|3ip to hold. However, one of course obtains much 
worse bounds on the number of parts using the stronger notion of regularity. 

Remark 4.6. Let us illustrate once again the difference between the dense 
and sparse cases with a simple observation. Given a pair (A, B) of sets of ver- 
tices of a graph G, let C±(A,B) denote the number of homomorphisms from 
C4 into the subgraph spanned by A U B mapping a given pair of opposite ver- 
tices into A and the other pair into B. Standard convexity arguments show 
that Ci{A,B) > d(A, i?) 4 |A| 2 |i?| 2 . The pair (A,B) is (e,p)-C 4 -mmmial if 
C 4 (A, B) < (d(A, B) A + ep 4 )|A| 2 |B| 2 . In the dense case (with p = 1) it is well 
known and very easy to check that e-rcgularity and £-C4-minimality are essen- 
tially equivalent: e- regularity implies /(e)-C4-minimality, and £-C4-minimality 
implies <7(e)-regularity, for some f(e),g(e) with f(e),g(e) —> as e — > 0. 

Let e > and M be given. By counting C4S it is easy to see that there 
is a function /(e) with f(e) — > as e — > such that, if n is large enough 
and (A, B) is e-rcgular with \A\ = \B\ = n, then we may partition A and B 
into sets A\ , . . . , Am and B\ , . . . , Bm of almost equal sizes so that every pair 
(Ai,Bj) is /(e)-regular. Indeed, a random partition has this property with 
probability tending to 1, since by standard concentration results (for example, 
the Hoeffding-Azuma inequality), the edge densities and '^-densities' of the 
pairs (Ai,Bj) are highly concentrated about the corresponding densities for 
(A, B). It follows immediately that in the usual dense Szemeredi's Lemma [55] . 
we may specify in advance the number of parts K we would like our partition 
to have, provided (as in the weak case) that K is large enough given e, and n 
large enough given e and K. 

In the sparse case, the fact about random partitioning above is presum- 
ably true, but the simple proof using C4-counts fails totally. It is still true 
that (e,p)-C4-minimality implies (/(e),p)-rcgularity, but the reverse implica- 
tion fails. Indeed, whenever p — p(n) — > 0, given any pair (A, B), we may add 
a small dense (say complete bipartite) subgraph with too few edges to disturb 
regularity, but containing many more than p 4 |A||B| C4S. 

4.3 Szemeredi's lemma and convergence in the cut norm 

We start with a consequence of Lemma 14.31 concerning the cut norm. 

Corollary 4.7. Let (G„) be a sequence of graphs satisfying Assumption \4-l\ 
Then there is a kernel k : [0, l] 2 — > [0, C] and a subsequence {G ni ) of (G„) 
such that d cu _t(G ni , k) — » 0. Moreover, we may label the vertices of G ni with 
1, 2, . . . , Hi so that \\KG ni - «||cut — ► 0. 
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Proof. We shall only sketch the proof as the argument is exactly the same as 
that of Lovasz and Szegedy [34] for the dense case. Note that given any r\ > 
and e > 0, our graphs G n satisfy the assumption (f2T))) of Lemma |4~31 with C + e 
in place of C whenever n is large enough. 

First, let us apply Lemma T4. 31 with k = 1 and e = £\ = 1/2, say, to obtain 
a weakly (e\ , p)-regular partition LI„.i of G n into k\ — K parts, for all large 
enough n. We may relabel the vertices of each G n so that the parts of U n ^ 
are all intervals. Each kernel G n /U n> i is characterized by a ki-by-ki density 
matrix, whose entries all lie in [0, C + ej]. (Indeed, if k\ happens to divide n, 
then the kernel is exactly the kernel obtained from the matrix in the obvious 
way.) Since these matrices live in a compact set, [0, C + £i] fcl , they have a 
convergent subsequence. Passing to the corresponding subsequence of G n , we 
then have G n /TL nj i — ► «i pointwise almost everywhere, and hence in L 1 and 
in the cut norm. Since the partitions H n ,i are weakly (£i,p)-regular, we have 
||kg„ — G„/n nj i|| cu t < e\. Passing far enough along our subsequence, it follows 
that \\kg„ - Ki|| cut < 2ei. 

Working within the subsequence defined above, apply Lcmma l4.3l again with 
e = £2 = 1/4, say, and k = k\. For each n we find a partition n nj 2 refining 
II„ 1; with k 2 = K(ei,C,ki) parts. Relabelling vertices, we may assume that 
each part of each n„ 2 is an interval. (Note that we only reorder the vertices 
within parts of n n l .) As before, on a subsequence we have G n /H n .2 — > for 
some kernel K2 constant on squares of side-length 1/&2. Since H n .2 refines n„.i 
for each n, it follows that the value of K\ on each 1/fci-by-l/fci square is exactly 
the average of K2 over this set; to see this, let n — > 00. 

Iterating, we find kernels Ki,K2, • •• each of which can be obtained by av- 
eraging the next one, and graphs G Ui with ||kg„. — K»||cut < 2e; = 2 1_l , say. 
To complete the proof we simply observe that the sequence («t) is a martingale 
on the state space [0,1] 2 . Since each n t is bounded by C + e t < C + 1, by 
the Martingale Convergence Theorem there is a kernel k : [0, l] 2 — > [0, C] with 
ftt — > k pointwise almost everywhere, and hence in L 1 and in the cut-norm. 
Then ||«g„. — ft|| C ut ~~ > as required. □ 

The corollary above says that any (suitable) sequence of graphs has a sub- 
sequence converging to a kernel, and is a simple consequence of Szemeredi's 
Lemma and the Martingale Convergence Theorem. Together with Lemma 14.21 
it shows that Assumption 14.11 is the correct assumption to impose on sequences 
of graphs when we seek limits that are bounded kernels n : [0, l] 2 — > [0, C]. 
Before turning to an application of Corollary 14.71 let us note an even simpler 
consequence of the Martingale Convergence Theorem. 

Lemma 4.8. Let n be a bounded kernel, and for k > 1, let be the piecewise 
constant kernel obtained by dividing [0, l] 2 into 2 2k squares of side 2~ k , and 
replacing k by its average over each square. Then Kk ~ > K pointwise almost 
everywhere and also in LP for any p. 

Proof. The sequence Kk is a bounded martingale on [0, l] 2 , so pointwise conver- 
gence is given by the Martingale Convergence Theorem. Since the sequence Kk 
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is bounded by sup K, convergence in LP follows by dominated convergence. □ 

A consequence of Corollary 14.71 is that it allows us to compare the two 
different versions of the cut metric. Recall that for graphs Gi, G 2 , we defined 
d C ut(Gi,G2) by first passing to kernels taking the values and 1/p. If Gi and 
G 2 have the same number of vertices, then there is a more natural definition 
of their cut-distance, d C ut(Gi,G 2 ), defined in the same way but only allowing 
rearrangements that 'map whole vertices to whole vertices'. As in the dense 
case, d cut (Gi,G 2 ) and dcut(Cn, G2) arc defined by (JSj> and (JOj) , respectively; 
the difference between the sparse and dense cases is in the normalization of 
KGi- Writing d\ nt and dj ut for the metrics defined using p = 1, Borgs, Chayes, 
Lovasz, Sos and Vesztergombi [T31 Theorem 2.3] showed that these metrics are 
equivalent, proving that 

dl ut (G 1: G 2 ) < ^ ut (Gx, G 2 ) < 32c£ ut (G 1 , G 2 ) 1 ^ 7 ■ (33) 

In fact, they proved (f3"3")) for edge-weighted graphs, as long as all edge weights 
lie in [—1,1]. Unlike simple Lipschitz equivalence, which may also hold, this 
does not directly carry over to the sparse setting: we have <i cu t = p~ 1 dl nt and 
^cut = P ^JutJ so P3p can be written as 

d cut (G 1 ,G 2 ) < 5cut(Gi,G 2 ) < 324ut(Gi,G 2 ) 1/6 V 66/67 , 

which is of little if any use here. However, the equivalence of the two metrics in 
the sparse case is not too hard to deduce from ([33)) . using Corollary |4.7l 

Lemma 4.9. For i = 1,2, let (Gn ) be a sequence of graphs satisfying the 
bounded density assumption \4-l\ Then d cut (Gn\ G n 2 ^ ) — > if and only if 

d C ut{Gn , Gn ) — ► 0. 

Proof. If d cut (G^G^) -» then, since d cu t < rf C ut, it follows trivially that 
d C ut (G n , G n ) — * 0. 

Suppose now that d cut (Gn , Gn) ^ 0; our aim is to show that d cn t(Gn \ Gn^) 
0, so we may suppose that this is not the case. Hence, passing to a subsequence, 
we may assume that d cu t(Gn\ Gn^) > S for some positive 5 and all n in our 
subsequence. 

Applying Corollary 14.71 twice, the second time to a suitable subsequence, 
we find kernels k\, k 2 : [0, l] 2 — > [0, C], and subsequences of the sequences 
(G^), defined for the same values of n, on which \ \K r ,ti> — k,:|| C u1 — 0- Since 1 

d cu t(Gn\ Gn^) — * 0, it follows that d cut (Ki, k 2 ) = 0. 

For any e > 0, by Lemma l4~8l we may find a K and kernels k[, k' 2 '■ [0, l] 2 — > 
[0, G] that are constant on squares of side l/K, with — /ti|| C ut < £• Since 
the kernels k\ may be thought of as weighted graphs, it would appear that we 
have gone round in circles, but the point is that they are dense weighted graphs. 
Regarding the kernels k[/C and k 2 /C as weighted graphs with edge weights in 
[0, 1], we have 

d^K'JC, 4/G) = doutfci, k' 2 )/C < 2e/G, 
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so gives 



rf C utK,4) = Cd cu tK/0,4/G) < 32G(2e/G) 1 / 67 = O^ 67 ). 

Hence, there is a rearrangement of k! x preserving intervals that is close to k' 2 in 
the cut norm. Ignoring divisibility, adapting this rearrangement to the graph 
Gn \ n much larger than K, and using ||k g (o — K^|| C ut < e + it follows 

that that d cut (Gi 1) , Gi 2) ) < 0(e) + 0(e 1/67 ). Choosing e small enough, the final 
bound is less than 5, contradicting our assumptions. □ 

Corollary 14.71 shows that one property of the cut metric carries over to the 
sparse setting: for every suitable sequence (G„), i.e., any sequence satisfying 
the bounded density assumption 14.11 there is a kernel k and a subsequence 
converging to k in d CU f In the other direction, as in the dense case, such a 
sequence is given by the natural random construction. 

Lemma 4.10. Let p = p(n) satisfy np — > oo, let C > be constant, let k : 
[0, l] 2 — > [0, C] be a bounded kernel, and let G n = G p (n, k). Then d cu t(G„, k) — > 
almost surely. Also, the sequence (G„) satisfies the bounded density assump- 
tion \4--l\ with probability 1. 

Proof. The second statement is essentially immediate from Chernoff 's inequal- 
ity, constructing G n as a subgraph of the Erdos-Renyi random graph G(n, Cp)\ 
it also follows from the first statement and Lemma T4. 21 

We now turn to the proof that d cu t(G n , k) — > 0. Recall that n is of finite 
type if [0, 1] may be partitioned into sets A±,... ,Ak so that k is constant on 
each rectangle Ai x A,-. We first suppose that n is of finite type. Rearranging 
k, and ignoring parts with measure zero, we may assume that each Ai is an 
interval with positive measure. Recall that G n = G p (n, k) is constructed by 
first choosing the 'types' x%, . . . , x n of the vertices independently and uniformly 
at random from [0,1]. Let ni denote the number of vertices of type i, noting 
that we have ~ /i(Ai)n a.s. Let us adjust the intervals Ai slightly, replacing 
Ai by a set A\ (= A'i(n)) with measure Let k' — n'(n) be the adjusted 

kernel, taking on A\ x A'j the value that k takes on Ai x Aj. Since, a.s., we 
adjust the length of each Ai by o(l), the kernels k' and k differ on a set of 
measure o(l). Since each is bounded, it follows that 

||K-K , ||cut<||K-K , ||l->0 (34) 

a.s., as n — > oo. 

Given x\, . . . , x n , let G' be the weighted graph in which each edge is present 
and has weight = pn(xi, Xj). Then, relabelling the vertices so that those 
with Xi = k correspond to the set A' k , we see that kqi = k! . The graph G n may 
be constructed from G' by simply selecting each edge ij independently, with 
probability equal to its weight in G'. As noted earlier, for a kernel correspond- 
ing to a (weighted) graph, the cut norm (defined by ([3])) is realized by a cut 
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corresponding to a partition of the vertex set, so 

||«<3 n - «' 1 1 cut = ||«G„ - «Gf ||cut = max 

SCV(G 

Having conditioned on xi, . . . , x n , for each S the random variable X = e Gn {S, S c ) 
has mean exactly J2i<=s jeS" Wi i- Furthermore, E(X) = 0(n 2 p). Since X is a 
sum of independent indicator variables, it follows from (for example) the Cher- 
noff bounds, that for any e > we have F(\X — K(X)\ > en 2 p) < cxp(—c e n 2 p) 
for some c s > 0. Since n 2 p = w(n), this probability decays superexponentially. 
Since there are only 2 n sets S to consider, we see that P(||re(3 n — re'|| cut > e) 
decays superexponentially as n — > oo. Since e > was arbitrary, using (1341) it 
follows that d cut (G„, re) — > a.s. 

So far we assumed that ft was of finite type. Given an arbitrary re, for each 
e > we can find a finite type approximation n e to k with 

||«e - /c||cut < \\k £ - k\\i < e; 

see, for example, Lemma l4.8l One can couple the random graphs G n = G p (n, k) 
and G' n = G p (n, k £ ) using the same vertex types x±, . . . , x n for each, in such a 
way that the symmetric difference G n AG' n has the distribution of G p (n, Are), 
where Are(x, y) = |re(x, y)— re e (x, y)\. The expected number of edges of G p (n, Are) 
is at most n(n — l)p||Are|| 1 /2 (with equality if pAre < 1), which is at most 
n 2 pe/2. It is easy to check that the actual number is tightly concentrated about 
the mean, so 

d cut (G n ,G' n ) < ||re G „ - re G , ||x - 2e ( G " AG ") < 2e 

n z p 

holds with probability tending (rapidly) to 1 as n — > oo. Using the finite- type 
case to show that d cu t(G' n , re e ) — > and the bound d cu t(re, re e ) < e, and recalling 
that e > was arbitrary, the result follows. □ 
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5 Comparison between cut and count conver- 
gence 

Throughout this section, we fix a function p = p(n), and consider sequences 
(G n ) of graphs with \G„\ = n. In the dense case, with p(n) = 1 for all n, 
Borgs, Chayes, Lovasz, Sos and Vcsztcrgombi [T5] showed that such a sequence 
converges to a kernel re in d cut if and only if it converges to re in d su \>; here we 
wish to investigate whether this result can be extended to the sparse case. To do 
this, we first have to make sense of the definitions. For d cu t, as in the previous 
section, we simply associate a kernel re„ to G n as before, with re n taking the 
values and 1 J p. Then we use the usual definition of cfcut for (dense) kernels to 
define d cu t(G ra ,G m ) and d C ut (G rl , re) . In the light of Lemma 14.91 for questions 
of convergence the metrics d cu t and d cu t are equivalent; we shall use d cu t rather 
than dent in this section. 
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5.1 Admissible subgraphs and their counts 

It p = n^ ^, then we use (JT7J) and (fT5| to define d su b, so convergence in d sub 
is equivalent to convergence of s p (F, G n ) for every graph F. For smaller p, as 
noted in Subsection 13.51 it makes sense only to consider graphs F in a certain 
set A of admissible graphs. It is not quite clear exactly which graphs should be 
admissible (see Subsection 13. 5p . so there are several variants of the definitions. 
To keep things simple, we shall work here with one particular choice for the set 
A, depending on the function p. It may be that the various conjectures we shall 
make, if true, extend to larger sets A. 

Recall that we write T for the set of isomorphism classes of finite (simple) 
graphs. Given a loopless multi-graph F and an integer t > 1, let Ft denote the 
graph obtained by subdividing each edge of F exactly t — 1 times, so e(F t ) = 
te(F) and |F t | = \F\ + (t - l)e(F). Writing T m for the set of isomorphism 
classes of finite loopless multi-graphs, for t > 2 let 

Tt = {Ft : F £ 

and set T\ = T (not T m ). Thus, for t > 2, the family Tt is the set of simple 
graphs that may be obtained as follows: starting with a set of paths of length 
t, identify subsets of the endpoints of these paths in an arbitrary way, except 
that the two endpoints of the same path may not be identified. Note that any 
Ft £ Tt has girth at least 2i. 

Similarly, let J->t be the set of simple graphs that may be obtained as above 
but starting with paths of length at least t. Thus T>\ = T and, for t > 2, T>t 
is the set of graphs that may be obtained from some F £ T m by subdividing 
each edge at least t — 1 times. Note that T = T>\ D T> 2 D ••• ■ Let T denote 
the set of (isomorphism classes of) finite trees. 

Throughout this subsection and the next we suppose that there is some 
a > such that np > n a for all large enough n. Equivalently, there is some 
integer t > 1 such that 

n*-V>n- o(1) . (35) 

We shall set 

A = T UT> t 

for the smallest such t, noting that if p = n -0 ' 1 ) then t = 1, so all graphs are 
admissible. (An alternative that would work just as well is to let A be the set of 
all subgraphs of graphs in !F>t, which includes T.) A key observation is that if 
F £ T>t then (considering the internal vertices on the paths making up F) we 
have \F\ > e(F)(t— l)/t. This also holds if F £ T, or indeed if F is a subgraph 
of some F' £ A. It follows that if F C F 1 £ A then 

Eemb(F, G(n,p)) ~ n^p^ = n^-^^-^in^p'y^ = n e(1) -° (1) -» o 

(36) 

On the one hand, A = T (J T>t is small enough to satisfy the requirements 
for admissibility discussed in Subsection 13. 5i including (j36|) . (There may be 
requirements we have missed, in which case A = T U T>t for some larger t is 
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likely to work.) On the other hand, as we shall now see, this set A is large 
enough to ensure that the counts for F G A determine a kernel, up to the 
equivalence relation ~ defined in Subsection 12.41 

Theorem 5.1. Let hl\ and k 2 be two bounded kernels, and t > 1 an odd integer. 
Suppose that s(F, Ki) = s(F, K2) for every F € Tt ■ Then k\ ~ k 2 . 

Proof. Given a kernel re, let k* be the kernel defined by 

A X ,y)=[ k(x, x 1 )k(x 1 ,x 2 ) ■ ■ ■ n(x t -i,y) dx x --- dx t -i. (37) 

In other words, roughly speaking, k*(cc,?/) counts the number of paths from x 
to y in k with length t. The key observation is that if F is a graph, k a kernel, 
and t > 1, then 

s(F t , K ) = s(F,K t ). (38) 

Indeed, s(F t ,K) is defined as an integral over one variable for each vertex of 
F t . We may evaluate this integral by first fixing the variables corresponding 
to vertices of F, then using (|3"T|) once for each edge of F to integrate over the 
remaining variables. What remains is exactly the integral defining s(F, k*). 

By assumption, s(F, ki) = s(F, k 2 ) for every F G T t - Hence, from (|38|) . we 
have s(F, k\) — s(F, k|) for every graph F, so, by Theorem 12. 81 or Theorcm l2.9l 
Hence, from (fT2|) . there is a kernel re and measure-preserving maps 
(71,0-2 : [0,1] -> [0,1] such that (k*) ((Ti) = re a.e., for i = 1,2. Since (reH)* = 
(re*)^), we thus have (re^)* = (re 2 )* a -C for re' ; = tif^. Since re^ ~ Ki, and 
our aim is to prove that k\ ~ K2, it suffices to prove that re^ ~ re 2 . Hence, 
without loss of generality, we may replace re, by rej-, so we have re* = re 2 almost 
everywhere. It is now a matter of simple analysis to deduce that n\ = re 2 a.e. 

Given a bounded signed kernel, i.e., a bounded function k : [0, l] 2 — ► R 
satisfying n(x,y) — n(y,x), let T K be the corresponding operator on L 2 ([0, 1]), 
defined by 

(TJ)(x) = f K{x,y)f{y)dy. (39) 
Jo 

From the Cauchy Schwarz inequality we have 

\\T K f\\i = J^j\(x, y )f(y)dy^j dx 

- L {I K ^ x ' y)2dy J f(y? d y) dx 

= II/II2 / / n{x,yfdxdy=\\f\\l\\K\\l 

so the operator norm of T K on L 2 satisfies 

M<|| K || 2 <oo. (40) 
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Now let k be any bounded kernel, and e > a real number. By Lemma 
there is some k such that the kernel Kfc obtained by averaging k over 2 _fe -by-2 _fe 
squares satisfies ||« — Kfe||2 < £■ Writing T K = T Kfc + T K - Kk , the first term has 
finite rank, since T Kk f is constant on intervals of length 2~ k . From (|4"0")) . the 
second term has operator norm at most ||/c — Kk\U < £• It follows that the image 
of the unit ball under T can be covered by a finite number of balls of radius 2e. 
Since e was arbitrary, this shows that T K is a compact operator. 

Since k is symmetric, we also have that T K self-adjoint. Consequently, T Kl 
is a compact self-adjoint operator on the Hilbert space L 2 (0, 1), so by standard 
results (see, for example, Bollobas [5]) there is an orthonormal basis of eigenvec- 
tors of T Kli and all its eigenvalues are real. It is easy to see that T K t = (T Kl )*, 
so T K t acts on the A-eigenspace of T Kl by multiplication by A*. Since t is odd 
(so the map A i — ^ A* is injective), it follows that T K t has the same eigenspaces 
as T K1 . Turning this around, the action of T K1 on each eigenspace E\ of T K t 

with eigenvalue A is to multiply by A 1 /*. Thus, T Kl is uniquely determined by 
T K t . In particular, since k\ = k\ a.e., the operators T Kl and T K2 are equal, i.e., 
Ki = K2 a.e., as required. □ 

Note that in Theorem 15.11 the restriction to odd t is essential, as shown by 
the following example. 

Example 5.2. Let ki and «2 be the two 2-by-2 'chessboard' kernels defined by 
Ki(x,y) = 



and 



1 if x < 1/2, y < 1/2 or x > 1/2, y > 1/2 
otherwise 



1 if a: < 1/2, y > 1/2 or x > 1/2, y<l/2 
K>2\x,y) < Q othcrwise 



Thus, in the dense case, K\ corresponds to the union of two disjoint complete 
graphs on n/2 vertices, and ki to the complete n/2-by-n/2 bipartite graph. 
It is easy to check that for any graph F we have s(F,ki) = 2 1 "' F I, while 
s(F,K2) = 2 1 ~I F I if F is bipartite, and s(F,K2) = otherwise. In particular, 
s(F, Ki) = s(F, K2) for all bipartite F, and hence for all F £ JF t , t even. 

As we saw from Lemma l4~2l and Corollary I4.7i bounded density is a natural 
condition to impose on our sequence (G n ) when dealing with c? cut for sparse 
graphs. In the previous sections, when dealing with subgraph counts and <i su b, 
we imposed different conditions, the closest being Assumption l3 . 21 Let us restate 
this here in the appropriate form when p need not be as large as n -0 ' 1 '. 

Assumption 5.3 (exponentially bounded admissible subgraph counts). 
There is a constant C such that , for each fixed F G A, we have lim sup s p (F, G n ) < 

C e ( F ^ as n — > 00. 

Note that we impose a condition only for F G A. When comparing d cu t and 
d su b, we need to impose both Assumption 14. II (bounded density) and Assump- 
tion [531 In the 'almost dense' case, when we take A — T, then Assumption 
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implies Assumption 14.11 with the same constant C. The argument is based on 
showing that a not-too-small dense part of G n would contain too many K t ^s 
for some large t. Since the details are very similar to the proof of Lemma 13.51 
we omit them. 

Unfortunately, in general neither of Assumptions 14.11 and 15.31 implies the 
other. In one direction, this is easy to see: simply add a complete graph on rn 
vertices, where m(n) is chosen so that e(K m ) ~ m 2 /2 = o(n 2 p). This does not 
affect Assumption 14.11 but, if m is chosen large enough, will create too many 
copies of any fixed connected graph F with |F| > 3. For the reverse direction, 
consider the following example. 

Example 5.4. Fix a real number D > 1, and let k = up be the unbounded 
kernel defined as follows. First partition [0, 1] into intervals Ii,Iz, ■ ■ ., so that Ii 
has length 2~\ Then set k(x,d) = i 2 l D if x, y € Ii, and n(x,y) = otherwise. 
Let F be a connected graph with average degree at most D. Then, since only 
terms where all vertices are in the same Ii contribute, we have 

oo oo oo 

S (F, k) = £ 2-^i 2 ^/ d < (2"*<) |F ' < £ 2 ~ H = 2 - 

i—1 i—1 i—1 

Let G n = G p (n, n) be the random graph defined from n as before. If every 
component of any admissible graph has average degree at most D, then it is 
easy to check that with probability 1 the sequence (G„) satisfies Assumption ^. 31 
(with C = 2). On the other hand, this sequence does not satisfy Assumption ^. 11 
since, for every i, there will be a subgraph of G n containing a positive fraction 
of the vertices with density around i 2 ' D . 

With the choice of A made here, whenever p = p{n) does not satisfy p = 
n~°^ then only trees and graphs in some T>t, t > 2, are admissible. All such 
graphs, and all their components, have average degree less than 4, so the example 
above shows that in this case, Assumption 15.31 does not imply Assumption 14.11 

Example 1 5 . 41 also shows that, in contrast to the almost dense case (where all 
graphs are admissible), in general we cannot tell from the admissible subgraph 
counts whether a kernel is bounded. For this reason, together with those dis- 
cussed above, when comparing d cu t and <i su b we impose both Assumptions 14.11 
and 15.31 

5.2 Conjectured equivalence between cut and count con- 
vergence 

Our main conjecture from Section[3]was that, in the sparse case, if the subgraph 
counts converge, they converge to those of a kernel. In the present setting, we 
consider counts for admissible subgraphs. Fix p(n) satisfying p5[) . and a set 
A of admissible graphs. By default we take A = T U T>t as in the previous 
subsection, although the definitions make sense for other sets A. Throughout we 
impose Assumptions 14. ll and 15.31 for some fixed constant C. Let X = [0,oo) A , 
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let s p : T — > X be the map defined by 

s P {G n ) = {s p (F,G n )) FeA G X 

for any graph G„ with n vertices, let d be any metric on X inducing product 
topology, and define d su b by mapping to X and then applying d\ as usual, we 
suppress the dependence on the normalizing function p. Note that d su h is in 
general a pseudo- metric rather than a metric: there may be non- isomorphic 
graphs G, G' with s p (F,G) = s p (F,G') for all F € A. As we only consider 
questions of convergence for sequences G n with \G n \ — > oo, this will not be 
relevant. 

Let C C X denote the set of possible limit points of sequences s p (G n ), where 
(G n ) satisfies our assumptions. 

Recall that we write K. for the space of kernels, that is, symmetric measurable 
functions k : [0, l] 2 — ► [0, C] quotiented by equivalence. There is a natural map 
from K. into X given by subgraph counts; we write s for this map, which does 
not depend on p (except through the choice of A). Since A always contains 
some set !F>t, and hence some Tt< with t' odd, Theorem 15.11 tells us that this 
map is injective. 

Our main conjecture is the following. 

Conjecture 5.5. With the assumptions and definitions above, we have 

C C s(JC). (41) 

Note that if t = 1 then p = n^°^ and we recover Conjecture 13.31 Conjcc- 
turc l5.5l seems to be the natural extension of Theorem 1 2. II to functions p = p{n) 
with p — > but np > n a for some a > 0. 

Turning to the equivalent of Theorem 1 2. 41 we believe that in this setting the 
notions of convergence given by c? su b and d cu t are equivalent. The most concrete 
way of saying this is as follows; again we take A = F^T>t by default, although 
it might be that the conjecture fails for this A but holds for some other A. 

Conjecture 5.6. Let (G n ) be a sequence satisfying Assumptions \^T\ artrf HOl 
and let k G JC. Then d cu t(G„, k) — > if and only if s p (G n ) — > s(k). 

In this form, the conjecture implies (|4"Tj) (see below). Without assuming 
([H]) , it still makes sense to compare the notions of Cauchy sequences instead. 

Conjecture 5.7. Let (G n ) be a sequence satisfying Assumptions \^T\ and [57S[ 
Then (G n ) is Cauchy with respect to d cut if and only if (G n ) is Cauchy with 
respect to d su \>. 

As we shall shortly see, Conjectures 15.61 and 15.71 are equivalent. 

Although we cannot prove the conjectures above, we can say something. 
Coniecture l5.61 for example, asserts two implications. Surprisingly, it is easy to 
show that, if (|4"Tj) holds, then either of these implications (for all sequences, not 
just a particular sequence) implies the other! To prove this we shall first show 
that the random graph G(n, k) behaves 'correctly' with respect to our definition 
of d su b; the corresponding result for d cu t is Lemma 14.101 
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Lemma 5.8. Fix C > 0, let k : [0, l] 2 — > [0, C] 6e a bounded kernel, and let 
G„ = G p (n,K). Then, with probability 1, the sequence (G„) satisfies Assump- 
tion^^ and we have s p (G n ) — > s(re). 

Outline proof. The first statement follows from the second, since s(F, k) < C e ( F > 
holds for every F, and in particular for F E A. 

It is well known that if F is a fixed graph, and p' = p'(n) is a function of n, 
then the number Xp of subgraphs of G(n,p') isomorphic to F is concentrated 
about its mean if and only if M(Xp') — ► oo for every subgraph F 1 of F. (For early 
results of this type see Bollobas [3] and Rucihski [35]; for more recent, much 
stronger, results see Janson [^Hj and Janson, Olcszkicwicz and Rucihski [25].) 

Our choice of the set A ensures that this holds for every F £ T with p 1 = Cp 
(see ([31)]) ~). proving the result if k is constant. It is straightforward to adapt this 
result to finite type k. It is easy to check that for the F we consider, any o(n 2 p) 
edges of G„ C G(n,Cp) meet o(n) F ^p e ^) copies of F. Using this observation, 
one can approximate the general case by the finite type case as in the proof of 
Lemma T4. 101 We omit the details. □ 

Lemma EH gives us a sequence tending in d su h to any k € IC. In other words, 
it shows that C D s(/C). Hence, if ([41]) holds, 

C = s(JC). (42) 

Let J C K, x C denote the set of pairs (n, A) E K, x C such that there is a 
sequence (G n ) satisfying our assumptions with d cu t(G„, k) — > and s p (G n ) — > A. 
Together, Lemmas 14.101 and Lemma 15.81 tell us much more than simply that 
C C s(/C): they show that the 'diagonal' I? = {(«;, s(k)) : k € AI} is contained 
in J. 

At this point, we have established three basic facts: 

FACT 1: Every subsequence of (G n ) has a subsequence converging in d su b 
to some point of C. This is trivial, since Assumption 15.31 ensures that s p (G n ) 
lives in a compact subset of X = [0, oo)- 4 . 

FACT 2: Every subsequence of (G n ) has a subsequence converging in d cu t 
to some kernel k € IC. This is the first part of Corollary 14. 71 

FACT 3: The map s is an injection from JC to C. As noted above, this 
follows from Theorem 15. II 

Facts 1 and 2 tell us that the relationship between the notions of convergence 
in d cn t and d su b is described by the set J . Indeed, any subsequence of (G„) 
itself has a subsequence in which we have convergence in both these metrics, to 
some point of J . 

Suppose for the moment that (|42[) holds. There are two possibilities. 

If J is precisely the diagonal T>, then the three facts above easily imply that 
Conjectures 15.61 and 15.71 both hold. 

If J ^ T>, then there is some off diagonal point A) in J . Since we are 
assuming (|42[) . we have A = s(k<x) for some «2 <= JC. From the definition of 
J there is a sequence (G n ) satisfying our assumptions, with d cu t(G n , k±) — * 
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and s p (G n ) — > s(«2)- Interleaving the sequence G n with the sequence G p (n, re), 
which converges to re in both d cu t and d su b, taking re = K\ or K2, we find a 
sequence which converges in one of c? cut or c? su b but not in the other. Hence, 
neither implication in Conjecture 15.61 or 15.71 holds, i.e., these conjectures fail as 
badly as possible. 

In the light of the comments above, Coniccture 15.61 has the following rather 
vague reformulation as a question. 

Question 5.9. Given a definition of 'suitable' sequences (G n ), let C be the set 
of all graphs F with the property that, whenever re is a bounded kernel and (G n ) 
is a suitable sequence with dcut(G„,re) — ► 0, then s p (F,G n ) — > s(F,k). Under 
what reasonable definition of 'suitable ' is the set C large enough that the counts 
s(F,k), F G C, determine a kernel re up to equivalence? 

The point is that, if C is large enough, then the three facts above hold with 
A = C, and we simply use C as the set of graphs whose counts we use to 
define d su h- Then, for our 'suitable' sequences, d cu t convergence implies d su b 
convergence to the same kernel by definition, so (re, A) € J implies A = s(re). 
Thus (|4ip (and hence l|42p) holds, and J = D, so c? su b convergence also implies 
d cu t convergence. Unfortunately, there is no obvious single choice for the set of 
suitable sequences. One could hope that sequences with bounded density would 
do, but this is not the case: by adding a complete graph with many (but still 
o{pn 2 )) edges to G(n,p), say, it is easy to check that in this case C consists only 
of matchings. Conjecture 15.61 is more specific than Question I5.9| since we define 
'suitable' by assuming s p (F, G n ) bounded for F in some set A, and then require 
CD A. 

If Conjecture 15.51 does not hold, then Conjectures 15.61 and 15.71 cannot hold. 
Indeed, there is some A £ £ not corresponding to a kernel. Taking G n converging 
to A in d S ub) and then a subsequence that converges in d cu t, there is some re with 
(re, A) £ J. Interleaving a corresponding sequence (G„) with G p (n,n), we find 
a sequence that converges in d C ut but not in d su h- 

Even if Conjecture 15.51 does not hold, it is still possible that there is some 
relationship between cut and subgraph convergence: it may be that every se- 
quence that is Cauchy with respect to d su b, and hence converges to some A 6 £, 
is Cauchy with respect to d cu t, i.e., converges to some re S K. This happens if 
and only if, for every A e C, there is a unique re € K. such that (re, A) € J . This 
is not as implausible as it may sound. Indeed, suppose Conjecture 15.61 holds 
for some admissible set A-, but that the definitions involved make sense for a 
larger set A+ . It may be that (|41j) fails working with A+ , because we are now 
allowing as admissible some counts which need not converge to what we expect. 
However, there is a restriction map from C+ to forgetting about the counts 
outside A~. Since (|42|) holds for the smaller set of admissible graphs, this would 
show that for the larger set there is only one re for each A, but not vice versa. 

In the next section we shall prove a form of Conjecture l5.61 Before doing so, 
let us briefly compare this conjecture with the corresponding result of Borgs, 
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Chayes, Lovasz, Sos and Vesztergombi [TS] for the dense case. In the dense 
setting, as here, Facts 1 and 2 above are easy to prove. That all limiting 
counts come from kernels was shown by Lovasz and Szegcdy |33] ; this gives (|4"2")l . 
Surprisingly, the hard part is proving Fact 3, that the counts (now meaning all 
counts) determine the kernel, up to equivalence as defined in Subsection 12.41 
(For us this was easy, since we deduced the sparse equivalent of this statement 
from the dense result, Theorem l2.8n Once one knows that the counts determine 
the kernel, the 'meta-argument' above shows that c? cu t convergence implies d su b 
convergence if and only if the reverse implication holds. Since the forward 
implication is very easy (see Corollary 12. 3[) . the result of [T5] that the two 
metrics are equivalent follows. This gives a proof of this result in which the 
only non-straightforward step is showing that the counts s(F, n) determine the 
kernel n up to the appropriate notion of equivalence. One might expect this 
uniqueness result to be easy, but this seems to be far from the case. Recently, 
Borgs, Chayes and Lovasz [12] gave a direct proof of this result (which, as noted 
in Section [21 actually follows from the results of [Kj); their proof is far from 
simple. 

5.3 Partial results: embedding lemmas 

Our aim in this section is to prove a positive result, that under certain circum- 
stances, if d cut (G n , k) — > 0, then s p (F, G n ) — > s p (F, k) for certain graphs F. In 
the case where k is of finite type, this is simply a counting lemma: in this case, 
G n — > k says that G n can be partitioned into (e,p) -regular pairs with densities 
given by k. In the uniform case, Chung and Graham [17j proved such counting 
lemmas for certain graphs under certain assumptions. The general case turns 
out to be rather different, but we shall still use several of their ideas. 

Wc start with the simplest case, where F is a path. First we need some 
definitions. As usual, in the proof it will be easier to consider homomorphisms 
from F to G n (i.e., walks in G n ) rather than embeddings. As we shall see later, 
this makes no difference. 

For G n a graph and Xo,...,Xg subsets of V(G n ), let G n (X , Xi, . . . , Xi) 
denote the number of (£ + l)-tuples (vi) with Vi € Xi and Uji>j_|_i €E E{G) for 
< i < I — 1. Identifying a subset of V(G n ) with a subset of [0, 1] as before, 
for a kernel k let 



Lemma 5.10. Let C > be constant, let p(n) be any function of n with np — > 
oo ; and let (G n ) be a sequence of graphs with t p {T,G n ) bounded for each tree 
T . For every e > and £ > 1 there is a S = Si(e) > such that, whenever 
k : [0, l] 2 — > [0, C] is a kernel with \ \kg„ — t||cut < then 




G n (X ,X 1 ,...,X e )- n e+1 p e K(X , X 1 ,...,X e )\< en e+1 p 



for any sets Xq, Xi . . . , Xi C V(G n ). 
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Roughly speaking, the lemma says that if G n — > k and t p (T, G n ) is bounded 
for each T, then t p (Pe,K) — ► s(P£,k). The stronger assertion makes it simpler 
to prove the result by induction. 

Proof. Renormalizing, we may assume without loss of generality that C = 1. 
Let us do so from now on. 

The fact that S is not allowed to depend on k allows us to assume without 
loss of generality that k is piecewise constant on squares of side 1/n, i.e., that K 
may be interpreted as a (dense) weighted graph with vertex set V(G n ). Indeed, 
the Frieze-Kannan form of Szemcrcdi's Lemma shows that there is an integer 
k such that, given any k, there is a k' that is constant on squares of side 1/k 
with d cut (K, k') < S. Tweaking n' slightly if k does not divide n, we obtain a 
kernel k" of the required form. Replacing S by 25 as appropriate, the result 
for k follows from the result for k". [Note that we implicitly assumed that n 
is large here, meaning larger than some no depending on e and I. We could 
simply assume this in the statement of the lemma, but it can be achieved by 
subdividing vertices. In fact, we could work with a kernel instead of a graph 
throughout the proof.] 

Let 

a/y y\ G n (X ,Xi, . . . ,X e ) 

A(Ao, . . . , A^J = K{Xo, Ai, . . . , JLe), 

so our aim is to show that |A(a"o, . . . , Xg)\ < e for all choices of the sets Xj. 
We shall show much more: let M = maxy sup„ t p (T, G n ), where the maximum 
is over trees with at most 11 + 1 vertices, noting that M < oo. We shall show 
that if d cu t(G n , k) < S, then, for any 1 < t < I and any X , . . . ,X t C V(G n ) we 
have 

\A(X ,X ll ...,X t )\<s t , (43) 

where S\ = 8, and 

for t > 2. Since ei tends to zero as S — > 0, taking 5 small enough we have 
S = ei < £2 < • • ■ £c < e, so to complete the proof of the lemma it suffices to 
prove (|4"3"|) for this choice of 5. 

We shall prove (|43|) by induction on t. For t = 1, the result is immediate 
from the definition of the cut norm: indeed, A(Xq, X\) is one of the quantities 
appearing in the supremum defining this norm. Suppose now that 2 < t < £, 
and that (|4"3")) holds with t replaced by t — 1. 

For v G V(G) and X u ...,X r C V(G), set 



k(v, X\, . . . , X r ) = / k(x, x\)k{x\, X2) ■ ■ ■ K(x r -i, x r ) dx\ ■ ■ ■ dx r , 

JX!X---XX r 

where x is any point of the interval of length l/n corresponding to the vertex 
v, and let 

At v v \ G n ({v},Xi, . . . ,X r ) 

A(v,Xi, ...,X r ) = — k{v,Xx, . . .,X r ). (44) 
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Note that 

A(X,X 1 ,...,X r ) = -£ Mv,X 1 ,...,X r ). (45) 

Fix Xq, . . . , X t C V(G n ), and set 77 = -Jet-i- Let B\ be the set of u € Xi with 
A(u,X 2 ,...,X t ) > 77. Then, from (jlB). A(£i . Xo. . . . . X,) > 77 [£i |/n. But by 
the induction hypothesis, A(_Bi, X2, . . . , X{) < £t_i = V 2 - Hence, < i]n. 
Arguing similarly, and using e t -i > £1, we see that the set B of vertices 
for which either |A(w, X2, ■ ■ ■ , Xf)\ > 77 or \A(v, Xq)\ > 77 holds has size at most 

47771. 

If v € Xi \ B, then we have roughly the right number of walks through v, 

i.e., 

G„(X , {t;}, X 2 , . . ., X t ) - G n ({v} 7 X )G n ({v},X 2 , ...,X t ) 

is close to npn(v, Xo)n t ~ 1 p t ~ 1 K,(v, X 2 , ■ ■ ■ ,X t ). More precisely, using the fact 
that k is pointwise bounded by C = 1 to bound the k terms in the last expression 
by 1 , for v G X\ \ B we have 

\A(Xo,v,X 2 ,...,X t )\ < 3r? 5 (46) 

where the left hand side is defined by analogy with (|4"4")l . 
It remains to consider v E B. For i = 1, 2, let 

0* = ]Tg„(Xo,{t;},AV..,X0 4 , 

ties 

noting that a\ < ^/\B\ai by the Cauchy-Schwarz inequality. Let T be the tree 
with 2t edges formed by identifying the second vertices of two paths of length 
t. Then 172 counts a subset of the homomorphisms from T into G„, so 

<t 2 < hom(T,G„) = n 2t+1 p 2 %(T,G n ) < Mn 2t+1 p 2t . 

Since \B\ < Arjn it follows that 

<J\ < \f\B\a~2 < i^/Ih-in^p 1 . 

Since k is bounded by 1, we have k(Xq, B, X2, ■ ■ ■ , X t ) < n(B) < 477, so 

\A(X ,B,X2,...,X t )\ < 2VM?7 + 4t7. 

Together with the bound PB]) for u G Xi \ -B and (the equivalent of) (|4"5")) . this 
implies that 

|A(X ,Xx,...,X t )| < 777 + 2VM^ = £i, 

as required. This completes the proof of (|43[) by induction, and thus the proof 
of the lemma. □ 

Note that the argument above works just as well for an arbitrary fixed tree 
rather than a path: we pick some leaf v to play the role of xq] the unique 
neighbour of v then plays the role of x\. This gives us a counting lemma for 
trees. 
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Corollary 5.11. Let (G n ) be a sequence of graphs with t p (T,G n ) bounded for 
every tree T, and suppose that d cu t(G n , k) — > ; where k is a bounded kernel. 
Then for each tree T we have t p (T, G n ) — > s(T, k) as n — > oo. □ 

Chung and Graham |17j proved a version of this result (for paths rather 
than trees) with k constant, under the assumption that the maximum de- 
gree of G n is at most Cpn. This maximum degree assumption of course gives 
t p (T, G n ) < C e ( T \ so it is stronger than the bounded tree counts assumption of 
Lemma 15.101 In some sense, the maximum degree condition is much stronger, 
but it turns out that our global assumption is just as good for questions involv- 
ing subgraph counts. The reason that Lemma 15.101 is more complicated than 
the corresponding simple result in |17j is that k is not uniform, not our weaker 
assumption. 

Wc stated earlier that, in the sparse case, the parameter s p (F, k) should be 
preferred to t p (F,K), even though t p tends to be easier to work with. Never- 
theless, in the case of trees, these parameters are equivalent, as shown by the 
following observation. 

Lemma 5.12. Let p(n) be any function of n with np — > oo, and let (G n ) be a 
sequence with s p (T, G n ) bounded for every tree T. Then, for each tree T , we 
have t p (T,G n ) ~ s p (T,G n ). Ln particular, t p (T,G n ) is bounded. 

Proof. Fix a tree T with k vertices. It suffices to show that the number Nt 
of non-injective homomorphisms from T to G n satisfies Nt = o(n k p k ~ 1 ) as 
n — » oo. Now the image of any non-injective homomorphism <fi from T to G n is 
a connected subgraph H of G n with £ vertices, where 1 < I < k — 1. Any such 
subgraph contains a tree T" with £ vertices, so for each £ there are (crudely) 
at most J2\T'\=£ em b(T', G n ) possibilities for vertex set of H, where the sum is 
over all trees T 1 with £ vertices. Since there are at most k homomorphisms <j> 
with image a given set of £ vertices, we thus have 

fc-i 

NT<Y. k ' Yl emb(T',G„). 

£=1 \T'\=t 

Since emb(T', G n ) = «(|t|')-P 6 ' T s p(T' ,G n ), the final term is 0(n l p l ~ x ) by as- 
sumption. It follows that Nt = 0(n k ~ 1 p k ~ 2 ) = o(n k p k ~ 1 ), as claimed. □ 

Lemma T5 . 121 allows us to restate Corollarv l5.11l in terms of the parameter s. 

Theorem 5.13. Let (G n ) be a sequence of graphs with s p {T,G n ) bounded for 
every tree T, and suppose that d cu t(G n , n) —* 0, where n is a bounded kernel. 
Then for each tree T we have s p (T, G n ) — > s(T, k) as n — ► oo. □ 

Theorem 15 . 1 31 may be regarded as an embedding lemma for trees. Our next 
aim is to prove a much more general result. Chung and Graham showed that, 
in the uniform case, if the number of paths of length I — 1 between any two 
vertices is at most a constant times what it should be, then almost all pairs of 
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vertices are joined by almost the right number of paths of length £, and hence 
G n contains asymptotically the expected number of copies of any F G Tg. This 
result is much harder than the paths result, even in the uniform case. Although 
we shall use the key idea of Chung and Graham, the proof does not carry over 
in a simple way. In the following result, we work with t p rather than s p for 
simplicity; we return to this later. 

Theorem 5.14. Let C > and I > 3 be fixed, and let p = p(n) be any 
function of n. Let (G„) be a sequence of graphs with sup n t p {F, G n ) < oo for 
each F G T U Tt U {C21-2}, an d suppose that d cut (G n , k) — > for some kernel 
k : [0, l] 2 -> [0, G] . Then t p (F, G n ) -> s(F, «) for each F G T t . 

Proof. Note that by Lemma |4~2I the sequence (G n ) has density bounded by G, 
i.e., it satisfies Assumption 14.11 Renormalizing, we shall assume without loss of 
generality that C = 1. 

Fix e > 0, and a graph Fi G Ti- Let 77 > be a small constant to be chosen 
below (depending on e, £ and Fe). By Lemma [4.51 there is some K such that 
for n large enough, which we assume from now on, G n has an (r],p) -regular 
partition LI = (Pi, . . . , P&) for some k = k(n) < K. Passing to a subsequence of 
(G n ), we may assume that k is constant. As usual, we shall ignore rounding to 
integers, assuming that each Pj contains exactly n/k vertices. 

Passing to a subsequence (again), we may assume that for all i and j the 
sequence dp(Pi,Pj) converges to some «/(Pi, Pj) € [0, 1]. Relabelling if necessary 
so that Pi consists of vertices v with in/k < v < (i + l)n/k, and identifying 
vertices with corresponding subsets of [0, 1] as usual, we may view k! as a kernel 
on [0,1] 2 . 

If n is large enough, which we assume, then each d p {Pi,Pj) is within e 
of K(Pi,Pj). It follows that 4ut(G„/n,K') < ||G„/II - k'\\i < e. Under 
our bounded density assumption 14.11 strong regularity implies weak regular- 
ity (for suitably transformed parameters), so choosing 77 small enough we have 
d cu t(G n , GW/n) < e. Hence, choosing n large enough, d cnt (n, «/) < d cut (K, G„) + 
rf C ut(G„, G„/II) + d cut (G n /U, k') < 3e. Hence, by Lemma for any fixed F 
we have 

\s(F,k)-s(F,k')\ = 0( S ), 

so it suffices to show that t p (Fi, G„) is close to s(Ft, k!) rather than to s(Fe, k). 
To avoid clutter in the notation, from now on we write k for the finite type 
kernel n' defined above; the original k plays no further role in the proof. Recall 
that k (formerly known as n') is bounded by 1. For u e Pj and v G Pj we shall 
abuse notation by writing n(u, v) = «(Pj, Pj) for the value of k at any point of 
[0, l] 2 corresponding to (u, v). Recall that \d p (Pi, Pj) — «(Pj, Pj)\ < e for all 

For v,w G V(G n ) and t > 1, let w t (v,w) denote the number of walks of 
length t in G n starting at v and ending at w; we suppress the dependence on 
G n in the notation. Let ^(v^w) denote the normalized 'expected' number of 
such walks, if G n behaved like the random graph G p (n, k). Let U C V 2 be the 
set of pairs (v,w) such that we(v,w) < (k 1 (v,w) — e)n p . We call the pairs 
(v, w) € {/ underconnected, since they are joined by 'too few' walks of length i. 
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We shall show that 

\U\ = \{(v, w) : w i (v, w) < (k £ {v, w) - e)n i ~ 1 p e }\ < en 2 (47) 

if ?y is chosen suitably, and then n is taken large enough. Before doing so, let us 
note that this implies the result. 

By Lemma l5.10[ if we choose i] small enough, then the total number of walks 
of length £ in G n is within en e+1 p e of the expected number in G p (n, k), namely 
||re \\in +1 p . If (|47|) holds, then if we count only a maximum of n e (v, w)n e ~ 1 p e 
walks for each pair (v, w) of endpoints, we still count at least (1 — £)(||re ||i — 
e)n e+1 p l walks, so there are at most 3en e+1 p e walks uncounted, using | |i < 1. 
Writing W for the set of overconnected pairs (v,w) £ V 2 with we{v,w) > 
(k?(v, w) + y/e)n e ~ 1 p e , it follows that 

\W\ < 3y/en 2 . (48) 

In other words, almost all pairs of vertices are joined by almost the right number 
of walks. 

Recall that we fixed a graph Ft £ Tg. Let Fg be obtained by subdividing 
the edges of a looplcss multi-graph F with vertex set u\, . . . , u r , so 

hom(F,,G„)= J2 

we(vi,Vj), (49) 

vi,...,v r eV(G„) «iUj£B(F) 

where the factors in the product corresponding to multiple edges of F arc of 
course repeated. Given muj £ E(F), let 2F1/E2 be the graph formed from 
two copies of Ft by identifying the vertices corresponding to Ui and identifying 
the vertices corresponding to Uj. Since 2Fg/E 2 £ Ti, we have t p (2Fg/E2,G n ) 
bounded. It follows by the Cauchy-Schwarz inequality that the number of 
homomorphisms from Fg into G n mapping and Uj to a pair in UU W is small, 
in fact of order e x ^r^ F ^p e{ - Fl ^\ the argument is as in the proof of Lemma [5.1 01 
Since the comment above applies to any edge UiUj of F, the contribution 
to the sum in (|49[) from terms in which one or more pairs (vi , Vj ) fall in U U 
W is small. But in the remaining terms, wg(vi,Vj) is well approximated by 
K e (vi,Vj)n e ~ 1 p i , and it follows that t p (Fg, G n ) is close to s(Fe, k): the difference 
is bounded by some function of \Fg | and e. In short, we have shown that to prove 
the theorem, it suffices to prove (|47|) . i.e., that there are few underconnected 
pairs. 

From now on, we forget the original graph Fg, and aim to prove (|47p . recalling 
that k is a fixed finite-type kernel and that G n /Tl is (pointwise) within e of k, 
where n = (P 1; . . . , P^) is our (?y,p)-regular partition of G n . It will be convenient 
to assume that e is fairly small. In particular, we shall assume that e < 1/40. 

Recall that all but at most rjk 2 pairs in our partition (Pj)i are (?7,p)-regular. 
Since all pairs have density at most 1 + e < 2, the irregular pairs contain at 
most 2nn 2 p edges. By assumption t p (T, G n ) is bounded for each tree T, and in 
particular for the trees formed from two paths by identifying an edge from each, 
so using Cauchy-Schwarz again a small set of edges meets only a small fraction 
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of the walks of length I in G n . In particular, the number of walks of length £ 
containing one or more edges from irregular pairs is 0(^/rjn +1 p e ). Taking 77 
small enough, we may assume that this quantity is less than e 2 n l+1 p /TO, say. 
It follows that in proving (|T7|) , we may delete all edges in irregular pairs, i.e., 
we may assume that every pair is regular: if (|47[) holds for the resulting graph 
G' n and kernel k' with e/2 in place of e, then (|T7| holds for our original graph 
G n and kernel k. 

The lower bound in the proof of Lemma 15.101 used only closeness of the 
graph and kernel in the cut norm, not the bounds on various tree counts. This 
argument can thus be applied locally to sequences of parts of our partition. 
Abusing notation, let us write Po, Pi, . . . , Pt~\ for an arbitrary sequence of i 
parts of our partition, with repetition allowed. For any subsets Xi C Pi, we find 
that there are at least 

i=0 i=0 

walks vqVi ■ ■ -vi-i with v.i G Xi, where 7 = 7(77, £) tends to as r\ — > 0. We 
choose 77 small enough that 7 < e 12 . Taking Xi = Pi for i > 0, and summing 
over all choices for the intermediate parts, a consequence of this is that if Pq 
and Pi-\ are any two parts, and X is any subset of Po, then there are at least 

(^- 1 (Po,P,_ 1 )|X |/|Po| -7)nV _1 A 2 (50) 

walks of length £ — 1 from Xq to Pi-\. 

Let us call a walk of length I — 1 in G n bad if there are at least Mn 2 p 
walks in G n with the same endpoints, where M is a constant to be chosen in a 
moment, depending on e but not on 77; otherwise, the walk is good. Each bad 
walk may be extended to at least Mn^^p 1-1 homomorphic images of ■ By 
assumption, t v {C2i-ii G n ) is bounded, so it follows that there are 0(n^p /M) 
bad walks. In particular, choosing the constant M large enough, we may assume 
that there are at most e g n e p e ~ 1 /3 bad walks. 

Suppose for a contradiction that (|47|) does not hold, i.e., the set U of un- 
derconnected pairs of vertices has size at least en 2 . Our first aim is to select a 
pair (P, P') of parts of our partition such that there are many underconnected 
pairs (u, v) in P x P', but not too many bad walks start in P. Since \U\ > en 2 
by assumption, there are at least ek/2 parts P with 

\Un (P x V)\ > sn 2 /(2k). (51) 

On the other hand, there are at most efc/3 parts P with the property that 
more than e 8 n e p^~ 1 /k bad walks start in P (otherwise there would be too many 
bad walks). Hence there exists a part P for which (|5Tj) holds, with at most 
s s n jr /k bad walks starting in P. Fix such a P. From (fSTj) and averaging, 
there is a part P' such that 

\Un(PxP')\ >en 2 /(2k 2 ) =e\P\\P'\/2. (52) 
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From now on, fix such a P'. 

Let us say that a pair (u, P") with u G P and P" a part of our partition is 
deficient if there are fewer than (fi^ _1 (P, P") — v /j)n e ~ 1 p e ~ 1 /k walks of length 
I — 1 from it to P", where 7 is as in (f50|) . For a given P", at most ^/jn/k 
vertices it G P form a deficient pair with P": otherwise, the set Xq of such 
vertices would have more than ^r/p 1 - -1 /k 2 fewer walks to P" than it should 
have, contradicting (|50|) . Hence, there are at most y^jn deficient pairs. Let 
D C P be the set of vertices it in more than 7 1 ' 4 /c deficient pairs. Then \D\ < 

V7«/(7 1/4 fc)=7 1/4 | J P|- 

Let us say that a pair (it, P") with it S P and P" a part of our partition is 
compromised if there are more than e 3 n e ~ 1 p i ~ 1 /k bad walks from it to P". Since 
at most e s n e p e ~ 1 jk bad walks start in P, there are at most e 5 n compromised 
pairs. Let C be the set of u G P in more than e 3 k compromised pairs; then 
\C\ <e 2 n/k = e 2 \P\. 

Let S C P be the set of vertices it for which there are at least e|P'|/4 vertices 
v G P' with (it, i>) G U. By ((52j) we have 

e\P\\P'\/2<\Un(PxP')\ < \S\\P'\+e\P\\P'\/4, 

so \S\ > e\P\/A > (7 1 / 4 + e 2 )|P|. Thus \S\ > \D\ + \C\, and there is some u in 
S \ (D U C). Fix such a u for the rest of the proof, and let U u denote the set of 
v G P' for which (it, v) is underconnected. 

At this point we have chosen a vertex u G P, a part P', and a set J7 U C P' 
with the following properties: 

(i) for each v G U u , there are at most (k 1 (u,v) — e)n t ~ 1 p l = (K e (P, P') — 
e)n e ~ 1 p e walks of length £ from u to v. 

(ii) |P u |>e|P'|/4, 

(iii) there are at most 7 X / 4 fc < e 3 fc deficient pairs (it, P"), 

(iv) there are at most e 3 fc compromised pairs (u, P"). 

From (i) and (ii) above, there are at least m = e 2 n e p e / (Ak) 'missing walks' 
from 11 to U u : the number of walks of length £ from u to P u falls short of 
the expected number in G p (n,K) by at least m. Let P" be any part of our 
partition. By a u-U u walk via P" we mean a walk of length I from 11 to P„ 
whose second last vertex lies in P"; the expected number of such walks is Npn = 
K l - x {P, P")k(P", P')\U u \n i - 1 p e /k. Note that £p» N p » is simply the expected 
number of walks from u to U u . Let mpn be the number of 'missing walks via 
P'", i.e., the difference between Npn and the number of u-U u walks via P", or 
zero if there are at least Npn such walks. The total number of missing walks is 
at most the sum of the numbers m p» , so 

mpn >m> e 2 n £ p e /(4k). 

P" 

Let us say that P" is useful if rap» > e 2 n e p £ /(8k 2 ), so the contribution to 
the sum above from non-useful parts P" is at most half the right hand side. 
Recalling that we have normalized so that n is bounded by 1, and that e < 1/40, 
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for each P" we have mp" < Np" < n e p e /k 2 ; it follows that there are at least 
e 2 k/8> 5e 3 k useful parts P" . 

Using (iii) and (iv) above, it follows that there is a part P" which is useful, 
but neither deficient nor compromised. Fix such a part P" . 

Recall that a walk of length I — 1 from utowS P" is good if it is not bad, 
i.e., if 

Wi-i(u,w) < N = Mn*~V _1 - (53) 
Since 7 1 / 4 < e 3 , and P" is neither deficient nor compromised, there are at least 

(/tf*- 1 (P,P")-2eV"V" 1 A 

good walks from it to P" . On the other hand, there are many missing walks 
via P". With this setup, we are finally ready to apply the key idea of Chung 
and Graham [T7], which is to partition the set P" into subsets according to 
the approximate number of walks from u to the relevant vertex, and then use 
regularity to show that there are about the right number of walks from U u to 
each such subset. In fact, there is a slick way of doing this. 
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Figure 1: The set P" is subdivided into sets Aj, with w^i(u, v) = i for each 
v G Aj. Each edge from A t to U u contributes i walks from u to U u via P" . 

For i > 0, let Ai be the set of vertices v G P" with w^-i(u, v) = i; see 
Figurc[U Also, let A+ = [j^A,. Then, 

Wl - 1 {u,P")=Y J i\A i \=Y,\ A t\- 

i>0 i>l 



More importantly, Si=i l^^l i s a ^ least the number of good walks from u to P", 
so 

XI \ A t\ > (k 1 -\P,P") - 2e 3 )n e - 1 p e - 1 /k. (54) 



i=l 
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Since (P",P') is (r],p) -regular with (normalized) density k(P",P') < 1, if A C 
P" and B C P' then e(A,P) > p«(P", P')|A| |P| - r 1 p(n/k) 2 (this is trivially 
true if one of A or B has size less than rjn/k). Since each edge from U u to A; 
forms the final edge of exactly i walks from u to £/ u , the number of walks from 
u to P u via P" is given by 



Y,i<Ai,U u ) > J^e(Af,U u ) 

i>l i=l 



> 



^p«(p",p')|A+||[/ u |- w Kfc) 5 



i=l 



> (re^pP") -2e 3 ) K (P",P')|?7„|n £ -y/fc-77iVp(n/fc) 2 , 

where we used (f5"4"|) in the last step. The main term is simply the expected 
number of walks from u to U u via P", so the conclusion is that there are at 
most 

2e 3 re(P", P')\U u \n l - 1 p t /k + r]Np(n/k) 2 (55) 

missing walks from u to U u via P". The two terms above may be bounded above 
by 2e 3 n l p l /k 2 and, recalling (|5U|) , r]Mn e p £ /k 2 , respectively. Choosing r\ < e 3 /M 
we thus have at most 3e 3 n £ //fc 2 missing walks via P" , i.e., rap» < 3£ 3 nV/fc 2 , 
which contradicts the fact that P" is useful. This contradiction completes the 
proof. □ 

Note that the argument above does not extend to £ = 2, and not only because 
C2 makes no sense. The problem is that we cannot define N as in (|53p (this 
quantity is now o(l)), but must take N = 1 instead, and then the second term 
in (|55p is too large. 

The proof of Theorem 15.141 actually gives rather more with almost no extra 
work. 

Theorem 5.15. Let C > and £ > 3 be fixed, and let p = p(n) be any function 
of n. Let (G n ) be a sequence of graphs with sup n t p (F,G n ) < 00 for each 
F € T U J->i U {C21-2}, and suppose that d cu t{G n , k) — > /or some bounded 
kernel k. Then t p (F, G n ) — > s(P, k) /or eac/i P G T U P>^. 

Proof. The conclusion for F E T follows from Corollary 15. Ill 

Fix P e and £ > 0, and let L be the length of the longest induced 
path in P. Noting that for t > £ we have C^t-i G P>£, the hypotheses of 
Theorem 15 . 1 41 are satisfied with £ replaced by any t in the range £ < t < L. The 
proof of that result thus shows that if rj is chosen small enough, then when we 
take an (ry,p)-regular partition of G n with associated kernel re', almost all pairs 
(v, w) of vertices are joined by almost the 'right' number of walks of each length 
t, £ < t < L. More precisely, writing re for re' as in the proof of Theorem 15.141 
and writing Ut for the set of pairs (v,w) with Wt(v,w) < («;'(«, w) — e)n t ~ l p t 
and W t for the set of pairs with w t (v, w) > («'(«, w) + y/e)n t ~ 1 p t , the proof of 
Theorem EH shows that \U t \ < en 2 for £ < t < L, and (hence) that \W t \ < 
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i^/en 2 for each t in this range. Using the analogue of l|4T))) in which each term 
Wi(-,-) is replaced by an appropriate term w t (-,-), as before we can use the 
Cauchy-Schwarz inequality to show that the contribution to t p (F,G n ) from 
terms with some pair (t>i, Vj) in the small set \J t Ut U Wt is small (of order e 1 ^ 4 ), 
and it follows as before that if 77 is small enough, then \t p (F,G n ) — s(F, is 
bounded by some function of F and e, giving the result. □ 

Let us note for later reference that, in one way, the assumptions of Theo- 
rems [5J2] and [5T5] ar( 3 weaker than they may first appear. Let F be a loopless 
multigraph with vertex set Ui, u<z, . . . , Uk, and let Fg £ Tg be obtained by sub- 
dividing each edge of F exactly t — 1 times. Then (|49|) may be rewritten as 

hom(F/, G n ) = n k E Wi(Vi,Vj) , 

\u z UjeE(F) J 

where the expectation is over the uniform choice of («i,«2j ■ ■ ■ G V(G n ) . 
Applying Holder's inequality, E(n[ =1 ^s) < (nE(|X l | r )) 1/r , with r = e(F), it 
follows that 

hom(F,,G„r < n kr J[ E(u*(« i> i; i ) r ) = n^E^^i, « 2 ) r ) r 

= n kr - 2r hom(H r ,t,G n ) r , (56) 

where iJ r ^ G Tg is the 'theta graph' consisting of r internally vertex disjoint 
paths of length £ joining the same pair of vertices. The normalizing factors work 
out correctly, so we have 

t p (Ft,G n ) < t p (H r ^,G n ). (57) 

Hence, the condition that t p (F, G n ) remain bounded for every F £ Fg is equiv- 
alent to the condition that t p (F, G n ) is bounded for F = H r ,g, r = 1, 2, . . .. 

Arguing similarly, for any F £ F>g we may bound t p (F, G n ) in terms of the 
quantities t p {H r ^i , G n ), where £' ranges over the lengths of the paths making up 
F. Hence, to show that t p (F, G n ) is bounded for all F € F>g, it suffices to prove 
the same condition for the graphs H rt gi, r > 1, £' > I. Note that these latter 
conditions arc simply moment conditions on the numbers of walks of various 
lengths joining a random pair of vertices of G n . 

In the case where the limiting kernel k is of finite type, Theorem 15.151 may 
be seen as a form of counting lemma. In this case, it is easy to strengthen the 
result to count homomorphisms from F into G n with each vertex mapped to a 
specified part of the partition of G n corresponding to the finite type kernel k, 
obtaining a result similar in form to Lemma 15.101 Such a (strengthened) finite 
type case of Theorem 15.141 or Theorem 15.151 is very much easier to prove than 
the general case: there is no need to apply Szemeredi's Lemma, and the proof of 
the result of Chung and Graham [17] for the uniform case goes through without 
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much modification. One might hope that, using Szemeredi's Lemma, the full 
generality of Theorem 15.151 would follow easily from the finite type case, but 
this is not true. The problem is that our assumptions are inescapably global: 
we assume, for instance, that the number of copies of C21-2 in G n is bounded 
by a multiple of the expected number of copies. When we take an (e,p)-regular 
partition, this gives no useful information about the number of copies of C2t-i 
in each regular pair: we have a bound that is of the form Mk 2l ~ 2 times the 
expected number of copies, where k is the number of parts. To apply the finite 
type case, we would need a bound independent of k. For this reason there seems 
to be no easy way around the work in the proof of Theorem 15. 141 

Theorem 15.151 may be seen as some progress towards a proof of some form 
of Conjecture 15.61 More precisely, it is almost an answer to Question 15.91 the 
only problem is that for Thcorem l5.15l wc work with t p rather than s p . We shall 
return to this in detail in a moment. However, even ignoring this, Theorcm l5.15l 
is a little disappointing in some ways. Let A = TUJ>£. Assuming boundedness 
of t p (F, G n ) for F G ALi {021-2}, we obtain convergence of the counts t p (F, G n ) 
for F G A. The extra assumption for F = C21-2 is somehow annoying. This 
is perhaps clearest if we consider the range where p is fairly large, say n -0 ' 1 '. 
In this case s p ~ t„, and it makes sense to assume boundedness of all counts 
s p (F,G„). However, since C2 does not make sense, the smallest value of £ for 
which we can apply Theorem 15.151 is i = 3, and we obtain convergence of the 
counts s p (F,G n ) for F G !F>%. In comparison, Theorem 13.201 shows that with 
the counts s p bounded, and s p (C4,G n ) — > 1, which should roughly correspond 
to convergence to the uniform kernel k = 1, we obtain s p (F, G n ) — > s(F, k) = 1 
for all F G J->2, rather than just for F G J->3- 

In fact, Theorem 13.201 gives much more: it gives convergence for all F with 
girth at least 4. Chung and Graham [T7] asked whether an analogous result holds 
for sparse graphs under the appropriate assumptions (what they call 'f-quasi 
randomness', which corresponds roughly to the assumptions of Theorem 15.141 
with k constant), with girth at least 4 replaced by girth at least 2£. In our 
language, they asked whether (when n = 1) the conclusion of Theorem l5.14l can 
be extended to all F with girth at least 21. Unfortunately, the answer is no for 
a trivial reason, namely that there are graphs F with arbitrarily large girth and 
arbitrarily large average degree. Taking p = n~ a for some < a < 1 , and d large 
enough, for any graph F with average degree d the expected number of copies 
of F in G n = G(n,p) is o(l), so the normalizing constant in the definition of 
t p (F, G n ) is o(l). Since hom(F, G n ) is an integer, we cannot have t p (F, G n ) — ► 1 
in this case. 

5.4 Embeddings or homomorphisms? 

In this subsection we return to the use of t p rather than s p in Theorems 15.141 
and 15.151 Although this simplifies the proof, it is unsatisfactory for a reason 
we shall now explain. We start by discussing the analogous problem with the 
corresponding result of Chung and Graham [17], their Theorem 8. We shall use 
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the following fact, proved by Blakley and Roy [5] in a slightly more general form 
in the context of symmetric matrices. 

Theorem 5.16. Let G be a graph with n vertices and average degree d. Then 
G contains at least nd e walks of length £. □ 

Recall that we write wt(u,v) for the number of walks of length t from u to 
v. Chung and Graham [17j impose the condition that wg-i(u, v) < cop i ~ 1 n e ~ 2 
holds for every pair of vertices u, v, where Co is a constant: they call this 
condition U{£). In other words, the number of walks from u to v is at most a 
constant times what it should be. Normalizing so that G n contains exactly pn 2 /2 
edges, Chung and Graham note that U(£) can only hold if p = Q(n~ 1+1 ' 
otherwise, the expected number of walks of length £ — 1 from a random u to 
a random v is much less than 1, so we-i(u,v) must sometimes be much larger 
than its expectation. 

In fact, U(£) cannot hold unless p is quite a bit larger, but for the 'wrong' 
reason: taking £ odd for simplicity, let £ = 2k + 1. Considering walks of length 
i—1 formed by tracing a walk of length k forwards and then backwards, we see 
that if G n has pn 2 /2 edges, then 

Y,m-i{v,v) > hom(P fe) G„) > n(np) k , (58) 

V 

where the second inequality is Theorem 15.161 Thus there is some v with 
W£-i(v, v) > (np) k , and it follows that U{£) can only hold if p = f2(n -1+2 /' £-1 '), 
so Theorem 8 of [17] can only be applied for p in this range. Note that this 
is an essential problem: this result counts homomorphisms (Chung and Gra- 
ham use the notation C G} for hom(H, G)), and the bound on we-i {u, v) 
is definitely used with u = v. Indeed, as we shall see, the conclusion fails if 
p = (n- 1+2 /(^i)). 

Turning to Theorem 15. 141 the condition that t p {C2t-2, G n ) remain bounded 
corresponds roughly to the condition U {£): indeed, the former says exactly that 

J2m-i(u,vf = 0(n 2i - 2 p 2i - 2 ), (59) 

which follows immediately from U(£). It turns out that the problem described 
above does not arise with ([59]) - in this second moment (rather than uniform) 
condition, the few pairs with u = v matter less. Indeed, it is easy to check 
that in G(n,p), for example, (|59[) holds as long as p — Vl{n~ 1+1 /^~ 1 ^ 1 ). [The 
expected number of homomorphisms from C21-1 whose image is a tree with k 
edges is 0{n{np) k ) = 0(n(np) f_1 ), and the expected number whose image is a 
graph with k vertices containing a cycle is 0{n k p k ) = 0((np) 2l ~ 2 )] However, 
the same problem arises in a different place. 

As before, let Hk,e £ Ti be the 'theta graph' formed by k paths of length £ 
joining the same pair (s,t) of vertices, with the paths internally vertex disjoint. 
Suppose that £ is even. Writing w t (v) = w t (v, V{G n )) for the number of walks 
of length t in G n starting at v, normalizing still so that e(G„) = pn 2 /2, and 
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considering homomorphisms from Hk t i to G n mapping s and t to a common 
vertex v, we have 

V \ V / 

where the second inequality is from convexity and the last from Theorem 15. 161 
Since \H k ,e\ = 2 + k(£ - 1) and e(H kJ ) = k£, it follows that t p (H kti ,G n ) > 
n k ~ x {np)~ M / 2 . Suppose that p < n - 1 + 2 / e -£ fo r some e > 0. Then taking k 
large enough we see that t p (Hk,t, G n ) — > oo, so neither the assumptions nor the 
conclusion of Theorem 15. 141 can hold. When e is small, this value of p is much 
larger than that above which the number of subgraphs of G(n,p) isomorphic to 
Hk y g is well behaved. 

The calculations above illustrate the problem with working with t p : wc 
count certain trees as copies of H^^i, for example, and the number of these trees 
exceeds the number of embeddings of Hj-j in a wide range of densities in which 
Theorem 15.141 might otherwise apply. For this reason, if we could replace t p by 
s p throughout the statement of the theorem, we would obtain a much stronger 
and more satisfactory result: not only would it count embeddings, which is what 
we are really interested in, but it would apply to a much larger family of graphs, 
for example, to random graphs with much lower densities. Unfortunately, the 
proof breaks down in various places if we simply replace t p by s p . However, the 
next result is a major step in this direction. 

Given vertices v, w of a graph G„, suppressing the dependence on G n , let 
us write pe(v, w) for the number of paths of length £ from v to w, so pi(v, w) < 
we{v,w). 

Theorem 5.17. Let C > and £ > 3 be fixed, and let p = p{n) be any function 
ofn. Let (G n ) be a sequence of graphs satisfying the following three conditions: 

sup s p {F, G n ) < oo for each F g T, (60) 

n 

^^^(^f^K-V"), (61) 

u v=£u 

and 

^^ W K-) fc = 0(n 2 + fc ^V £ ) 5 (62) 

for each fixed k > 1. Suppose also that d cu t(G n , k) ^ for some kernel k : 
[0, l] 2 -> [0, C]. Then s p (F, G n ) -> s{F, k) for each F e T t . 

Before turning to the proof of this result, let us make some remarks on the 
conditions above. Firstly, in (|60p it makes no difference whether we write s p or 
t p , by Lemma 15.121 

Condition is almost the same as the condition s p (Hk,e,G n ) — 0(1). 
Indeed, cmb(Hk.e, G n ) is simply the sum over distinct u and v of the number of 
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fc-tuples of internally vertex disjoint paths from u to v, so which bounds 
the same sum without the restriction to disjoint paths, is formally stronger than 
s p {Hk,i, G n ) — 0(1). Since there are (typically) many paths from u to v in the 
range of p for which (|6"Tj) may hold, it seems very likely that, assuming the other 
conditions of Theorem 15. 171 s p (Hkj,G n ) = 0(1) implies (f6"2")h so (p)2")) could be 
replaced by this more pleasant condition. However, we do not have a proof of 
this. 

Similarly, condition (jUTj) is closely related to s p (C2£-2,G n ) = O(l), and 
could perhaps be replaced by this weaker condition. This is less clear, however, 
as Theorem 15.171 can be applied for p small enough that the typical number of 
paths of length £ — 1 between a given pair of vertices is 0(1). 

Instead of (foTj) we can always impose the stronger condition t v {Cn-2, G n ) = 
0(1); these conditions are probably equivalent in the present setting. The cor- 
responding statement for (|62j) and the stronger assumption t p (Hkj, G n ) = 0(1) 
is not true; see the discussion of the behaviour of t p (Hk,e, G n ) in the paragraphs 
preceding Theorem 15. 171 

Finally, let us note that (|6"2")l gives us control over s p (Fe, G n ) for all Fg E T(, 
not just for Fi = Hk,e- Let F( be obtained by subdividing a graph F with vertex 
set Ui,M2, . . • ,Uk- Then 

cmb(F e ,G n )< ^2 Pi{Vi,Vj), 

v 1 ,v 2 ,...,v k Ui Uj£E(F) 

where the sum is over all nrfA fc-tuples of distinct vertices of G n . Applying 
Holder's inequality as in the proof (|56[) of (|57|) . but in a probability space with 
n(u\ elements rather than n k , we find that 

emb(F e ,G n ) < n ik) E( Pe (v u v 2 ) e ^), 

where the expectation is over the choice of a random pair (v\,V2) of distinct 
vertices of G n . Condition (jl?2"l) bounds the final expectation; as usual the 
normalizing factors work out, and we see that if (|62[) holds for every k then 
s p (Fi, G n ) = 0(1) for every F e G T t . 

Outline proof of Theorem \5.17\ Since the proof is a relatively simple modifica- 
tion of that of Theorem 15.141 we shall give only an outline, concentrating on 
the differences. 

The first change we make is that we work with paths rather than walks, 
replacing the quantities Wt(u, v), t = I — 1,£, appearing in the proof of Theo- 
rem [5T4] with the corresponding quantities p t (u,v). By Lemma [5. 121 all but 
a vanishing fraction of the walks in G n of a given length are paths, so (j4T]) . 
for example, implies the same statement with we(v, w) replaced by pe(v, w). Of 
course, (|4"T)) was proved using the assumption t p (C2£-2 1 G n ) = 0(1), whereas 
we now have the weaker assumption (|6ip . However, following through the proof 
it is easy to see that if we count paths instead of walks, then (|61[) suffices. (The 
key point is that (|6ip suffices to bound the number of bad paths, i.e., paths 
between endpoints u, v with p£-±(u, v) > Mu'"V _1 .) 
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Let us fix (a small) e > and a graph Fg e J^. We also fix an integer iV to 
be chosen later, depending only on e and Fg. Finally, let 77 be a small positive 
constant depending on e, Fg and N. For reasons that will become clear later, we 
first partition V(G n ) into N almost equal parts Qi,. . ., Qn- Then wc take an 
(r7,p)-regular partition (Pi) with each Pj contained in some Qj. For the moment 
we ignore the partition (Qi). 

As before, passing to a subsequence wc assume that the densities d p (Pi, Pj) 
converge to a finite-type kernel k. Let S C V x V be the set of pairs of vertices 
joined by the 'wrong' number of paths of length £: 

S = {(v, w) : v ^ w, \pt(v, w) — n e (v, w)n e ~ 1 p e \ > en x p}. 

If 7] is chosen small enough then the proofs of l|47p and (|48[) carry though 
counting paths instead of walks, and (replacing e by e 2 /10), the equivalents of 
gTJ) and ([48]) imply that 

\S\<en(n-l). (63) 

We proceed from here to our bound on s p (Fg,G n ) in two steps. First we count 
something that is not quite an embedding of Fg . 

Let Fg be obtained from the loopless multigraph F by subdividing each edge 
£ — 1 times, and let ui, . . . ,Uk be the vertices of F, which we also regard as 
vertices of Fg. By a semiembedding of Fg into G n we mean a homomorphism 
from Fg into G n that maps the vertices ui,. to distinct vertices of G„, 

and each of the e(F) Ui—Uj paths of length £ that make up the graph Fg into a 
path in G n . Clearly, every embedding is a semiembedding; the only additional 
condition on an embedding is that the paths in G n are internally vertex disjoint. 

Let emb + (Fg,G n ) > emb(Fg,G n ) denote the number of semiembeddings of 
Fg into G n . Then, from the definition of a semiembedding, we have 

emb+(F t ,G n )= ^ JJ pg(v tl v 3 ), (64) 

where the sum is over all n^) sequences (v±, . . . , Vf~) of distinct vertices of G n 
and, as usual, any multiple edges in F give rise to multiple factors in the product. 

As before we, we can rewrite the formula above as an expectation over a 
random choice of (ui,. ..,«&). Normalizing correctly for a change, let Xij be 
the random variable pg(vi,Vj)/ (n e ~ 1 p ), so 

+ emb+(f>,G„) emb + (F £ ,G n ) 

Sp{ n> n (im p< F *) ~ n m) n\ F i\-\ F \p< F *) 

Equation |63|) says, roughly speaking, that each is with high probability 
close to 'what it should be', which is a random variable depending on n, the 
kernel corresponding to the partition (Pi,...,Pk) of G n . We should like to 
deduce that the expectation of the product is close to what it should be. 
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Let Z be the set of /c-tuples (vi, . . . ,Vk) with the distinct such that 
(vi,Vj) £ S for some 1 < i < j < k. Regarding Z as an event in our prob- 
ability space, 

nz)<(%((vuV2)eS)<e( k 



l/(e(F)+l) 



v 2, 

from (|63p. Holder's inequality thus gives 



b k n ^ < h(ir )+i ) n E (*r +] 

\ UiUjGE(F) / \ u z Uj£E{F) 

where 1^ is the indicator function of the event Z . Now, for each i and j, we 
have 

1 « J n(n - 1) ^ ^ V ™ £ ~V / 

which is 0(1) by our assumption @2J. Also, E(1^ (F)+1 ) = E(l z ) = P(Z) < e. 
Hence, 

e(iz II =0(^(^+1)). (65) 

In other words, the contribution to (|64p from semiembeddings mapping some 
edge of F into a pair (it, u) € 5 is negligible. By definition of S, the contribution 
from all other semiembeddings is 'what it should be', and it follows that 

\a+(Fi, G n ) ~ a{F t , k)\ < O^/W^+i)) + 0(e). 

Since e > was arbitrary, we thus have s p (Fi, G n ) ~ s(i^, k). 

In the end, of course, it is s p (Fi, G n ) that we wish to bound, not s p (Ft, G n ). 
Since s p (Fi, G n ) < s+(i^, G n ) it remains to show that most semiembeddings are 
in fact embeddings, i.e., that the paths in G n making up a typical semiembed- 
ding are internally vertex disjoint. For paths corresponding to vertex disjoint 
edges of F, this is quite easy, using the fact that s p (T, G n ) is bounded for each 
tree, which tells us that almost all pairs of paths of length I are vertex disjoint. 
For paths corresponding to edges of F sharing a vertex, there is a similar ar- 
gument. We shall not spell these arguments out as there is a third case that 
cannot be handled in this way, namely paths corresponding to duplicate edges 
in F. We must allow these, since we include, for example, C21 in Ft. It is in 
handling these paths that our 'crude' partition (Qi) comes in. 

Let us classify paths wowi, . . . ,W£ in G n into N e+1 types, according to which 
part Qi each Wi lies in. We say that a pair (u, v) of distinct vertices of G n is 
good if, for all N £ ~ x possible types of u-v path, the number of u-v paths of this 
type is 'close' to what it should be, i.e., within e\Qi\ e ~ 1 p e ~ sn e ~ 1 p i /N 1-1 of 
what it should be. As usual, 'what it should be' means the expected number in 
G p (n, k), which depends not only on which parts Pi the vertices u and v lie in, 
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but also on the type of path being considered. Let S' be the set of pairs (u, v), 
u 7^ v, that are bad, i.e., not good. 

Since N is fixed before 77 is chosen, it is not hard to see that the argument 
giving ([Hll) (applied with e/N e ~ 1 in place of e) also shows that \S'\ < en(n — 1); 
we omit the details. In other words, almost all pairs of vertices are joined by 
about the right number of paths of any given type. As before, we break down 
the set of embeddings of Fg into G n according to which vertices v\ , . . . , Vk of 
G n the 'branch vertices' ui,...,Uk are mapped to. Defining Z' analogously 
to Z, but using S' instead of S, the argument giving ([6"S"|) shows that we may 
assume that (vi, . . . ,Vk) ^ Z', i.e., that no pair (vi,Vj) is in S' . Counting 
embeddings with v\, . . . ,Vk fixed, it remains to choose e(F) paths joining the 
appropriate pairs Vi, Vj. Let us choose these paths one by one. Since the total 
number of paths joining Vi to Vj is about what it should be, all we must show 
is that few (say at most en e ~ 1 p e ) paths from Vi to Vj meet one of our at most 
e(F) — 1 earlier paths. But this is now easy: we must avoid a set X of at most 
(e(F) — 1)(£ — 1) = O(l) vertices, the internal vertices of the previously chosen 
paths. In fact, we shall do much more, avoiding any part Q a that meets XI This 
rules out at most (£ — l)|X|iV f_2 of the N l_1 types of Vi~Vj paths. Choosing TV 
large enough (larger than 1/e), this is only a fraction 0(e) of all possible types. 
Since (?;,-, Vj) ^ S", we have almost the right number of paths of each remaining 
type, and hence almost the right number of paths in total. This completes our 
outline proof of Theorem 15.171 □ 

Of course, there is a variant of Theorem 15.171 which is to Theorem 15.171 as 
Theorem 15. 151 is to Theorem 15 .141 we shall not state this separately. 

Let us close this section by giving one simple example of a setting in which 
the conditions of Theorem 15 . 1 71 arc satisfied. Fix I > 3, and suppose that our 
sequence (G n ) has the following two properties. Firstly, the maximal degree 
A(G„) is not too large: 

A(G„) < Mpn, (66) 

for some constant M. Secondly, 

pt-i(u,v) < Mn^y- 1 (67) 

for all u ^ v € V(G n ). Condition |66|) is called DEC in Chung and Graham [17] : 
condition (f67|) is related to their condition U(£), but, as noted in the paragraph 
containing (|58|) . is much weaker. In particular, it is easy to check that if p = n~ a 
with < a < 1 constant, and k is any bounded kernel, then the random graphs 
G p (n,K) satisfy (|66|) and (|67|) with probability 1 , as long as a < 1 — 1/(1 — 1). If 
([66]1 and (J6TJ) hold then p e (v, w) < Af 2 r/"V for all v and w, while s p (T, G n ) < 
M e ( T ) for any tree T, so the conditions of Theorem 1 5 . 1 71 are satisfied. Similarly, 
Pt(v,w) < M t ~ e+2 n t ~ 1 p t holds for all t > £, so the variant of Theorem 15.171 
corresponding to Theorem 1 5 . 1 51 applies . 

It follows that conditions (|66|) and ([67)) provide an answer to Question 15.91 
Indeed, Theorem 15.171 tells us that, under these conditions, if K is a bounded 
kernel, then d cu t(G n , k) — > implies s p (F,G n ) — > s(F,k) for all F € Ft; its 
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variant gives us s p (F, G„ ) — ► s(F, k) for all F £ F>£. By Theorem l5.il the counts 
s(F, k), F £ F>£, do determine the kernel (up to equivalence), so conditions (|66[) 
and (|57j) arc 'suitable' in the sense of Question l5.9l As noted after Question l5.91 
this implies the following result. 

Theorem 5.18. Fix £ > 3, let p = p(n) be any function, and let (G n ) be 
a sequence of graphs satisfying (|66|) . (|67[) and the bounded density assump- 
tion^!^ Then, for any bounded kernel k, we have d cu t(G n , k) — > if and 
only if d S ub(G„, k) — > 0, where d su b is defined using A = T U F>i for the set of 
admissible graphs. □ 

In this section we discussed how to extend the subgraph (count) metric 
to sparse graphs, noting that there are various possibilities (depending on the 
choice of the set A of admissible graphs), and conjectured that one particular 
extension is equivalent to the cut metric. In the next section we turn to a 
different metric, that extends much more easily to sparse graphs. 

6 The partition metric 

As noted in Section [2] for dense graphs there are many natural metrics that 
turn out to be equivalent, in the sense of generating the same topology. So 
far we have focussed on the cut and subgraph (or count) metrics; we now turn 
to the partition metric, introduced by Borgs, Chayes, Lovasz, Sos and Veszter- 
gombi [IB] . In the dense case, it turns out to be relatively easy to show that 
the partition and cut metrics are equivalent; in this brief section wc show that, 
under mild assumptions, this equivalence holds also in the sparse setting, as 
long as np — > oo . 

On the one hand, this result (Theorem l6.21 below) shows that for graphs with 
u>(n) edges, no new questions arise by considering the partition metric. On the 
other hand, it reinforces the conclusion that the cut metric remains extremely 
natural for sparse graphs, and gives a way of considering the cut metric from 
a very different point of view. There is another, very important, motivation 
for introducing partition metrics for sparse graphs: when we come to extremely 
spare graphs, with 0(n) edges, the cut metric turns out to make very little sense, 
while the partition metric (which is no longer equivalent) remains natural. This 
is a major topic in its own right and will be discussed in a companion paper [11] . 

6.1 Partition matrices and the partition metric 

Turning to the formal definitions, as in the rest of the paper, let p = p(n) be a 
normalizing function and G n a graph with n vertices. Let k > 2 be fixed. For 
n > k and n = (Pi, . . . , Pk) a partition of V(G n ) into k non-empty parts, let 
Mu(G n ) = (d p (Pi, Pj))i<ij<k be the matrix encoding the normalized densities 
of edges between the parts of n (see (|27P). Since Mn(G n ) is symmetric, we may 
think of this matrix as an clement of R fc ( fe+1 )/ 2 . Set 

Mk(G n ) = {Mu(G n )} c R fe ( fc+1 )/ 2 , 
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where II runs over all balanced partitions of V(G n ) into k parts, i.e., all parti- 
tions (Pi, ... , P k ) with \Pi\ - IP, | < 1. 

As usual, we assume that G n has 0{pn 2 ) edges. For definiteness, let us 
assume that e(G n ) < Cpn 2 /2. Since each part of a balanced partition has 
size at least n/(2k), the entries of any Mu(G n ) s M.k{G n ) are bounded by 

Cu = (2fc) 2 C, say. Thus, Aik(G n ) is a subset of the compact space Bk = 
[0iCfc]fc (fc+D/2_ 

Let Co(Pfc) denote the set of non-empty compact subsets of Bk, and let dn 
be the Hausdorff metric on Co(Pfc), defined with respect to the £ x distance, say. 
Thus 

da(X,Y) = inf{e > : D Y, D X}, 

where X^ e > denotes the e- neighbour hood of X in the metric. Since (Bk, £00) 
is compact, by standard results (see, for example, Dugundji [20l p. 253]), the 
space (Co(-Bfc), dn) is compact. To ensure that the metric we are about to define 
is a genuine metric, it is convenient to work with C(Bk) = Co(Pfc) U {0}, setting 
dn(0, X) = Cfc, say, for any X e C(Pfc), so the empty set is an isolated point in 
(C(B k ),d H ). 

Let C = Ilfe>2 C(Pfc); and let M : T 1— ► C be the map defined by 

M(G n ) = [M k {G n ))T =2 

for every graph G n on n vertices, noting that Mk(G n ) is empty if k > 71. Then 
we may define the partition metric d pal -t by 

dpart(G,G') = d(^(G),M(G')), 

where d is any metric on C giving rise to the product topology. Considering 
the partition of an n vertex graph into n parts shows that d part is a metric on 
the set T of isomorphism classes of finite graphs. Recalling that each space 
(C(Bk), dft) is compact, the key property of the partition metric is that (G n ) 
is Cauchy with respect to <i par t if and only if there are compact sets Yk C Bk 
such that dn(Mk(G n ), Yk) — > for each k. In particular, convergence in d par t is 
equivalent to convergence of the set of partition matrices for each fixed k. Thus 
we may always think of k as fixed and n as much larger than k. 

In the dense case, a metric equivalent to d pa rt has been introduced indepen- 
dently by Borgs, Chayes, Lovasz, Sos and Vesztergombi [TB]; the only difference 
is that in [16] . all partitions into k parts are considered, rather than just bal- 
anced partitions. Of course, one then needs to take care to ensure that the 
densities between small parts are counted with an appropriate weight when 
computing the distance between density matrices M.k- Whether one takes all 
partitions or just balanced partitions is a matter of taste: it is very easy to see 
that convergence in either of the resulting metrics implies convergence in the 
other. 

We may extend the map M : T — > C, and hence d par t, to bounded kernels in 
a natural way: instead of partitioning the vertex set into k almost equal parts, 
we partition [0, 1] into k exactly equal parts, and consider the closure of the 
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set of 'density matrices' that may be obtained from k using such partitions; 
we omit the details. Note that, as shown by Borgs, Chayes, Lovasz, Sos and 
Vesztergombi [TH Example 4.4], the set of density matrices is not in general 
closed. 

As for the cut metric, it is easy to check that it makes little difference whether 
we define d pa rt for graphs directly, or by going via kernels. (The corresponding 
dense result appears in [16) : the sparse case here is slightly more complicated 
due to the possibility of 'high-degree' vertices.) 

Lemma 6.1. Let p = p(n) satisfy p > 1/n, and let (G n ) be a sequence of 
graphs with e(G n ) = 0{pn 2 ) and A(G n ) = o(pn 2 ). Then d peir t(G n , kg„ ) — ► 
as n — > oo. 

Proof. By definition, wc must show that dn(Mk{G n ), Mk{^G n )) — * for each 
k > 1. Fix k. Since e(G n ) = 0(pn 2 ), there is a constant D such that at most 
11/ (2k) vertices of G n have degree more than Dpn. Let L denote the set of 
'low-degree' vertices, with degree at most Dpn. so \L\ > n — n/(2k). 

Wc must show that for any density matrix in Mk(G n ) there is a nearby ma- 
trix in .Mfc(/vG„), and vice versa. The forward implication is trivial: a balanced 
partition II of V(G n ) corresponds to a partition of [0, 1] into sets whose sizes 
differ by 0(l/n) = o(l). Adjusting these parts slightly, making changes only 
in subintervals of [0, 1] corresponding to low degree vertices, the entries of the 
corresponding density matrix change by o(l). 

For the reverse implication, let n be a partition of [0, 1] into k parts Pi , . . . , Pfc, 
and let M S Mk{^G n ) be the corresponding density matrix, with entries mij. 
For v £ V(G n ) = [n] and 1 < i < k, let p Vt i be the fraction of the subinterval of 
[0, 1] corresponding to the vertex v that lies in Pj, noting that YliPv,i = 1 f° r 
each v, and p Vt i = n/k for each i. Form a random partition n' = (P[, ■ ■ ■ , PL) 
as follows: put each vertex v into a random part P/ with ¥(i v = i) = p v ,i, with 
the choices independent for different vertices v. 

It is immediate that E(|P/|) = n/k and Var(|P/|) < n/k. It follows that for 
some constant C we have 

Vt : ||P/| -n/k\ < C^Ti (68) 

with probability at least 0.99. Writing v ~ w if vw G E(G n ), for 1 < i,j < k 
we have 

E(e(P/,P;))= J2 *(K=ilu.=j) 



so the expectation of e(P/, Pj)/(n 2 p) is exactly niij/k 2 . For edges vw, v'w' of 
G n , the random variables l» w =»li m =j and lj , = jl, ,—j are independent unless 
vw and v'w' share a vertex, in which case their covariance is at most one. It 
follows that Var(e(P/, P')) is bounded by 2 hom(P2, G n ); the factor 2 arises 



— / ] Vv ,iVw .j 



n p I n Gn (x,y)dxdy, 

'PiXPi 
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since we may put the common vertex of two incident edges into Pi or Pj. But 
hom(P2,G n ) < 2e(G„)A(G n ), which is o(n 4 p 2 ) by assumption. Hence, for any 
e, the probability that we have 



n 2 p 



< e 



(69) 



for every i and j with 1 < i, j < k is at least 0.99, provided n is large enough. 

From the comments above, if n is large enough, there is a partition II' for 
which both (|68p and (|69[) hold. Starting from such a partition and moving at 
most O(yfn) = o(n) vertices of L (the set of low-degree vertices) between parts, 
we may hnd a balanced partition with almost the same density matrix. In other 
words, we may find an element of M.k{G n ) close to M, completing the proof. □ 

If np — > oo, then the condition of Lemma [6.11 that A(G„) = o{n 2 p) holds 
trivially, since A(G„) < n = o(n 2 p). When np is bounded, this condition is 
necessary. Taking G n to be a star, for example, every partition of V(G n ) has 
the property that there is one part meeting all edges. But the corresponding 
kernel has partitions which are very far from having this property, namely those 
in which, roughly speaking, the central vertex of the star has been split between 
parts. 



6.2 The relationship between the cut and partition met- 
rics 

We now turn to the main result of this section, showing the equivalence of d cut 
and c?p ar t under mild assumptions. The key idea of the proof is that one can 
identify the density matrix corresponding to a weakly (e,p)-regular partition 
from the set of density matrices. 

Theorem 6.2. Let np — ► oo, and let (G n ) be a sequence of graphs with |G„| = n 
satisfying the bounded density assumption ^. 1\ Let k be a bounded kernel. Then 
rf P art(G„, k) — > if and only if d cut (G„, k) — > 0. 

Proof. Suppose first that d cu t(G n , k) — * 0, i.e., that d cu t(Kc n , k) — > 0. If Hi 
and K.2 are any kernels with d cu t(ni, K2) < d, and M G Mk{^i), then there 
is an M' € Mk(^2) whose entries differ from those of M by at most k 2 d: 
one simply takes the corresponding partition for k 2 , after rearranging so that 
I — m 2 I |cut < d. It follows that da(Mk(Ki), Mki^ j) < k 2 d cut {Ki, k 2 ). Hence, 
dn{Mk{^G n )i -Mk(n)) — > 0. Using Lemma loTTl it follows that rf pa rt(G„, k) — > 0. 

Now suppose that d P art(G n , k) — > 0. By the indexmd(M) of a density matrix 
M = {rriij) G .Mfc(ft') we mean simply k~ 2 ^ m 2 j. Let f(k, e) > k be a function 
to be specified later. A fc-by-fc density matrix M G M.k{n') is locally e-optimal 
for a kernel k' if 



sup sup ind(A/') < ind(M) + e, 

£<f(k,e) M'eMi(K') 
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i.e., if M has almost maximal index among density matrices with not too many 
parts; the definition of local optimality for M 6 Mk{G n ) is similar. 

Fix e > 0. Since (G n ) has bounded density, whenever n is large enough as a 
function of fc, any density matrix in A4k(G n ) has index at most some constant 
C. It follows that there is a K = K(C,e) such that, for n large enough, every 
G n has some locally optimal density matrix Mfe(n) of size at most K. (This 
statement is a key part of the proof of Szemeredi's Lemma.) 

Since d par t(G„, n) — > 0, if n is large enough, there is an M' k (n) G A^fc(rt) 
with all entries within e/(10C) of those of Mfc(n). It follows that ind(M£(n)) > 
ind(Mfe(n)) — e/2. Similarly, for n large, every M' G U^^.^-M^k) has all 
entries within e/(10C) of some M G U^</(fc, £ ) M.i(G n ), which implies 

ind(M') < ind(M) + e/2 < ind(M fe (n)) + 3e/2 < ind(M£(n)) + 2e, 

using the assumption that Mfe(n) is locally e-optimal for G„ for the second 
inequality. Thus M' k (n) is locally 2e-optimal for k. 

Recall that a partition LT of [0,1] is weakly (e,p)-regular with respect to a 
kernel «' if the corresponding averaged kernel k'/H satisfies — ft'|| cut < e. 

The proof of Lemma l4~3l (a sparse form of the Frieze-Kannan form of Szemeredi's 
Lemma) shows that if (G„) has bounded density, then there is a function f(k, e) 
such that, if n > n n (k,e) and M G M.k{G n ) is locally e-optimal, then the 
corresponding partition of Kg n is weakly (e,p)-regular; the same applies to k. 
It follows that for n large, identifying each density matrix with a corresponding 
kernel, we have d cut («G„ > M k (n)), d cut (M k (n), M' k (n)) and d cut (M k (n), re) all of 
order 0(e). Since e was arbitrary, it follows that d cu t(KQ n , k) — > 0, as required. 

□ 

In the light of Corollary 14. 7i Theorem 16.21 implies that a sequence (G n ) 
satisfying Assumption 14.11 is Cauchy with respect to d pa rt if and only if it is 
Cauchy with respect to d C ut- 

The bounded density assumption in Theorem 16. 21 which is trivially satisfied 
in the dense case p = 6(1), is necessary in general. This can be seen by consid- 
ering, for example, a graph G n made up of n/m complete graphs of order m, 
with m ~ pn = o(n) chosen so that G n has pn 2 /2 edges. By compactness, any 
sequence with e(G n ) = 0(pn 2 ) has a subsequence that is Cauchy with respect 
to dpart (here, in fact, the original sequence is Cauchy). However, it is easy to 
check that no subsequence of (G n ) is Cauchy with respect to d cu t- 

The proof of Theorem l6 . 21 applies just as well to kernels as to graphs (and one 
can in any case approximate kernels by dense graphs), showing that d pa rt('*n, to) - 
if and only if d cu t(K n , n) — > 0. It follows that d pa rt induces a metric on JC, 
the set of kernels quotiented by equivalence, and that d par t and d cut give rise to 
the same topology on IC. This was proved by Borgs, Chaycs, Lovasz, Sos and 
Vesztergombi [16] in their study of the dense case, as part of their Theorem 3.5. 
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7 Discussion and closing remarks 



For dense graphs, with 0(n 2 ) edges, the results of Borgs, Chayes, Lovasz, Sos 
and Vcsztergombi jT5l [16] show that one single metric, say c? cu t, effectively cap- 
tures several natural notions of local and global similarity. Indeed, convergence 
in d cu t is equivalent to convergence in the partition metric d par t (a natural global 
notion) and to convergence in d su b, i-e., convergence of all small subgraph counts, 
a natural local notion. These results apply to all sequences (G n ) of graphs, but 
if G„ has o(n 2 ) edges then they become trivial: any such sequence is Cauchy 
with respect to any of the metrics, and indeed converges to the zero kernel. To 
make interesting statements about sparse graphs one should adapt the metrics 
so that, roughly speaking, given an 'edge density function' p = p(n) satisfying 
p — > 0, one compares a graph G n with p{^) edges to the Erdos-Renyi random 
graph G(n,p) and its inhomogencous variants rather than to K n . Our main aim 
in this paper has been to introduce such metrics, and to discuss the relationships 
between them. In this final section we turn to a slightly different question, that 
of the relationship between metrics and random graph models. 

7.1 Models and metrics 

In the dense case, there is a very natural correspondence between limit points 
of sequences converging in d cu t, and the inhomogencous random graph model 
G(n, k). In general, given any metric, we can ask whether there is a correspond- 
ing random graph model: for each metric d on some class of (sparse) graphs 
satisfying certain restrictions, we can ask the following question. 

Question 7.1. Given a metric d, can we find a 'natural' family of random 
graph models with the following two properties: (i) for each model, the sequence 
of random graphs (G„) generated by the model is Cauchy with respect to d with 
probability 1, and (ii) for any sequence (G„) with \G n \ = n that is Cauchy with 
respect to d, there is a model from the family such that, if we interleave (G n ) 
with a sequence of random graphs from the model, the resulting sequence is still 
Cauchy with probability 1 . 

In the above question, we are implicitly assuming a coupling between the 
probability spaces on which the graphs (G n ) are defined. There is of course no 
need to do so: we can replace 'Cauchy with probability 1' with the less familiar 
'Cauchy in probability', which is equivalent to convergence in probability in the 
completion; see Kallcnbcrg [2^1 Lemma 4.6]. 

Although Question 17.11 is rather vague, for d = d cu t the answer is 'yes' in 
the dense case, since (G„) is Cauchy if and only if d cut (G n , n) — > for some 
kernel k, while the dense inhomogeneous random graphs G(n, n) converge to 
k in d cut with probability 1. Thus our family consists of one model G(n, k) 
for each kernel n (to be precise, for each equivalence class of kernels under the 
relation ~ defined in Subsection I2.4[) . 

In the sparse case we do not have an entirely satisfactory answer for any of 
the metrics considered in this paper. Assuming that np — > oo, there is an al- 
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most completely satisfactory answer for d cut : if we impose the bounded density 
assumption 14. li then Corollary 14. 71 and Lemma T4. 101 show that the sparse inho- 
mogeneous models G p (n, k) answer Question l7.ll For d su b, defined with respect 
to certain restricted sets of subgraphs, the results in Section [5] (in particular, 
Theorem l5.18|) show that once again G p (n, k) answers this question for suitably 
restricted sequences. 

The extremely sparse case, where p = 0(l/n), turns out to be even more 
complicated; we shall discuss this in a forthcoming paper [IT]. 

There is an even vaguer, but perhaps more important, 'mirror image' of 
Question 17.11 Suppose that we have a random graph model, and we would 
like to test whether it is appropriate for some network in the real world. Then 
we would like to have a suitable metric to compare a 'typical' graph from the 
model with the real-world network. It is too much to hope that one metric will 
be appropriate in all situations; in particular, taking the simple case in which 
our model is G(n,p) for some p = p(n) —> 0, the unnormalized metrics d cu t, 
rfsub or dp ar t, that are very suitable for dense graphs, will declare any graph 
with o(n 2 ) edges to be close to the model. 

In general, a random graph model (or family of models) may suggest an 
appropriate metric, or at least properties such a metric should have. For ex- 
ample, the inhomogcncous models G p {n,n) and the results here suggest the 
sparse version of d cut . Suppose, however, that we are trying to model a net- 
work with rather few edges but high 'clustering', i.e., many triangles and other 
small subgraphs. One possible model is a denser version of the sparse random 
graphs with clustering introduced by Bollobas, Janson and Riordan [9j: given, 
for each fixed graph F, a 'kernel' kf : [0, 1]' F ' — *■ [0, oo) and a normalizing 
function pf(ji), we choose vertex types x\, . . . , x n independently and uniformly 
at random and then, for each F, add each possible copy of F with vertex set 
vx, . . . , Vk, 1 < v\ < V2 < ■ ■ ■ < i>fc < n, with probability Kf{x Vi , ■ ■ ■ ,x Vk )pF(n). 

In this model, a huge family of normalizations are possible: we can take 
each pf to be any function of n bounded by 1. Of course, certain restrictions 
will be necessary for the model to make much sense; otherwise, for example, the 
copies of some F\ added directly may be swamped by copies of F\ arising as 
subgraphs of some F2 , in which case there was no point adding any copies of F\ 
directly. However, there is no doubt that many different normalizations will be 
interesting: for example, for any < a < 4/3, we can produce graphs with, say, 
9(?i 4 / 3 ) edges and 6(n a ) triangles. Indeed, to do so we need only two kernels, 
one for edges (which we may take to generate a bipartite graph if needed), and 
one for triangles. 

If, for some reason, we arc considering graphs with, say, around n 4 / 3 edges 
and n 6 / 5 triangles, which is many more triangles than expected in G(n, n~ 2 / 3 ), 
then the triangles are an important part of the structure, so in comparing two 
such graphs we should certainly compare the number of triangles, normalized 
by dividing by n 6 / 5 . This suggests a family of metrics generalizing d su t>. 

For each F E T let Np = Nf(ti) be a normalizing function satisfying < 
Np < 00. (We allow infinity to include the possibility of totally ignoring copies 
of some F. In fact, Nf = rjl F l +1 will do just as well.) Then we may define 
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a subgraph metric associated to N = (Np)p £ yr by modifying the definition of 
d S ub given in Section[3l using the normalized count emb(F, G)/Np{\G\) in place 
of s p (F, G). This metric will only make sense for suitably restricted families of 
graphs, but for such families, it will make much better sense than d su b- 

7.2 Closing Remarks 

The main aim of this paper is to draw attention to the possibility that there 
is a rich theory of sparse (quasi-)random graphs waiting to be explored. The 
beginnings of such a theory can be found in the papers of Bollobas, Janson 
and Riordan [5J [S] in the very sparse case, and of Borgs, Chayes, Lovasz, Sos, 
Szegedy and Vesztergombi [T31[II][31] m the dense case; it would be desirable to 
build a theory encompassing these two extreme threads. As we have just shown, 
this task is unlikely to be easy: there are numerous unexpected difficulties and 
pitfalls, and much work has to be done even to arrive at concrete problems 
whose solutions would represent genuine progress in this endeavour. In this 
paper we have attempted to do some of this groundwork, and have identified 
some intriguing problems. 

Our main focus has been the introduction of normalized versions of the met- 
rics d cu t, rfsub and d par t, adapted to the study of graphs with <d(pn 2 ) edges, 
where p = p(n) — > 0. We have shown in Section [6] that (under a mild as- 
sumption) d cut and d palt have the same Cauchy sequences, and in Section [5] 
that (again under a mild assumption) these metrics have the property that any 
sequence (G„) contains a subsequence converging to a kernel. 

Turning to d su b, things become more difficult. We have conjectured that if 
our p-normalized subgraph counts are suitably bounded and p = p(n) is not 
too small then an appropriate Cauchy sequence does converge to a kernel (see 
Conjectures 13.31 and I3.4|) . Tantalizingly, wc cannot even prove this convergence 
in just about the simplest case, when we know that the limit has to be a constant 
kernel (Conjecture 13. 9[) . 

Section [5] is devoted to the relationship between d cut and d S ub- A sound 
understanding of the relationship between these two metrics, the cut and count 
metrics, would bring us much closer to a proper theory of sparse inhomogeneous 
quasi-random graphs. We have conjectured that under some natural and not 
too restrictive conditions, these two metrics are equivalent in the sense that if 
(G n ) is a sequence of graphs that are not too 'lumpy' then (G n ) converges to 
a kernel k in the p-cut metric if and only if it converges to k in the p-count 
metric (see Conjecture I5.6j) . As one of our main results, we have proved that p- 
cut convergence does imply p-count convergence for a restricted set of subgraph 
counts, under a mild assumption on the distribution of paths of certain lengths 
(see Theorems 15.151 and I5.17P . 

The case of graphs of bounded average degree turns out to be even more 
difficult, and will be discussed in a companion paper [llj . 
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